I'm writing a query to extract text that was entered through a vendor-created word processor to an Oracle database and I need to export it to Word or Excel. The text is entered into a memo field and the text is intertwined with codes that the word processor uses for different functions (bold, indent, hard return, font size, etc.).
I've used the replace function to parse out a lot of the more common codes, but there are so many variations, it's nearly impossible to catch them all. Is there a way to do this? Unfortunately, I'm limited to using Microsoft Access 2010 to try and accomplish this.
The common thread I've found is that all the codes start with a back-slash and I'd like to be able to delete all strings that start with a back-slash up to the next space so all the codes are stripped out of the final text.
Here's a brief example of the text I'm working with:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;
\viewkind4\uc1\pard\f0\fs36 An abbreviated survey was conducted
on 02/02/15 to investigate complaint #OK000227. \par
No deficiencies were cited.\par
\fs20\par
}}
The text you are working with is RTF. Here is a tutorial about the file format.
This link (on another site, registration required) may give you copy & paste code you can use to convert rtf fields to txt.
You may be able to copy the value of the field from the database and paste it into notepad and then save the notepad file as "test.rtf"...you could then double click the file icon and the document may open.
RTF is an old MS file format that allows formatting of text. See this wikipedia page.
If your machine has Microsoft Word installed then you already have an RTF parser available so you don't have to "roll your own". You can just get Word to open the RTF document and save it as plain text like this:
Then, if you had a table with two Memo fields - [rtfText] and [plainText] - you could extract the plain text into the second Memo field using the following query in Access: