RE: Tools: PDF to SQL

Subject: RE: Tools: PDF to SQL
From: Ed Klopfenstein <eklopfenstein -at- proclarity -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Fri, 23 Aug 2002 09:27:15 -0600


Matthew Horn wrote:

*************************************************
1. Export the PDF as an RTF document
2. Open RTF in favorite word processor (Word).
3. Convert to simple text format (TXT).
4. Buy a book on Perl. Laura LeMay's book will get you up to speed fast. And
I believe she is a member of this list.
5. Write script that uses Perl's text-manipulation functions to extract data
from text file.
6. Extend this script or write another script that converts the extracted
data into SQL statements.
7. Insert SQL into database.
*************************************************

I've used a similar technique, but instead of learning Perl (could take time
-- it's not called the "toothpick" language for it's easy to understand
syntax), I would suggest either using Word's search and replace capabilities
to massage the data or create a quick macro to clean up the data. You could
also do this by hand if you don't know code.

What you want is a final text file that uses some unique deliminator. I like
double colons (::) since they're more unique than commas. SQL Server's DTS
Wizard can pull data from structured text files as easily as Access' import
wizard -- just make sure you have the same number of deliminated columns
across all rows. If you're more visually oriented, creating a Word table and
then converting the table to deliminated text might also be an easy
solution. Look up tables and double deliminators in Word's Help files.

A final hint is to ensure you match your column data with the column data
type in SQL Server. For instance, if you're creating a bunch of text columns
and SQL wants an integer, you're going to generate an error.

Feel free to contact me offline. I'd be happy to help or give you some
quickie macros to provide a starting place.

Regards;

Ed Klopfenstein


Technical Writer
ProClarity Corporation
PO Box 8064
Boise, ID 83707
eklopfenstein -at- proclarity -dot- com <mailto:eklopfenstein -at- proclarity -dot- com>
http://www.proclarity.com <http://www.proclarity.com>



^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Check out the new release of RoboDemo, our easy-to-use tutorial software.
Plus, buy RoboHelp Office in August and save $100 with our mail-in rebate.
Get details and download free trial versions at http://www.ehelp.com/techwr-l

TECHWR-L is supported by ads and sponsorships...and donations.
You can help maintain the TECHWR-L community with donations
at http://www.raycomm.com/techwhirl/abouttechwhirl/donate.html

---
You are currently subscribed to techwr-l as:
archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.



Previous by Author: RE: HPJ files
Next by Author: RE: Tools: PDF to SQL
Previous by Thread: RE: Tools: PDF to SQL
Next by Thread: RE: Tools: PDF to SQL


What this post helpful? Share it with friends and colleagues:


Sponsored Ads