Deleting words/strings containing a specific chara

2019-08-01 06:51发布

I'm writing a query to extract text that was entered through a vendor-created word processor to an Oracle database and I need to export it to Word or Excel. The text is entered into a memo field and the text is intertwined with codes that the word processor uses for different functions (bold, indent, hard return, font size, etc.).

I've used the replace function to parse out a lot of the more common codes, but there are so many variations, it's nearly impossible to catch them all. Is there a way to do this? Unfortunately, I'm limited to using Microsoft Access 2010 to try and accomplish this.

The common thread I've found is that all the codes start with a back-slash and I'd like to be able to delete all strings that start with a back-slash up to the next space so all the codes are stripped out of the final text.

Here's a brief example of the text I'm working with:

  {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;

    \viewkind4\uc1\pard\f0\fs36 An abbreviated survey was conducted
    on 02/02/15 to investigate complaint #OK000227. \par  
    No deficiencies were cited.\par

    \fs20\par
}}

2条回答
Ridiculous、
2楼-- · 2019-08-01 07:21

The text you are working with is RTF. Here is a tutorial about the file format.

This link (on another site, registration required) may give you copy & paste code you can use to convert rtf fields to txt.

You may be able to copy the value of the field from the database and paste it into notepad and then save the notepad file as "test.rtf"...you could then double click the file icon and the document may open.

RTF is an old MS file format that allows formatting of text. See this wikipedia page.

查看更多
Summer. ? 凉城
3楼-- · 2019-08-01 07:23

If your machine has Microsoft Word installed then you already have an RTF parser available so you don't have to "roll your own". You can just get Word to open the RTF document and save it as plain text like this:

Option Compare Database
Option Explicit

Public Function RtfToPlainText(rtfText As Variant) As Variant
    Dim rtn As Variant
    Dim tempFolder As String, rtfPath As String, txtPath As String
    Dim fso As Object  ' FileSystemObject
    Dim f As Object  ' TextStream
    Dim wordApp As Object  ' Word.Application
    Dim wordDoc As Object  ' Word.Document
    Dim tempFileName As String
    tempFileName = "~RtfToPlainText"

    If IsNull(rtfText) Then
        rtn = Null
    Else
        ' save RTF text as file
        Set fso = CreateObject("Scripting.FileSystemObject")
        tempFolder = fso.GetSpecialFolder(2)  ' Temporaryfolder
        rtfPath = tempFolder & "\" & tempFileName & ".rtf"
        Set f = fso.CreateTextFile(rtfPath)
        f.Write rtfText
        f.Close
        Set f = Nothing

        ' open in Word and save as plain text
        Set wordApp = CreateObject("Word.Application")
        Set wordDoc = wordApp.Documents.Open(rtfPath)
        txtPath = tempFolder & "\" & tempFileName & ".txt"
        wordDoc.SaveAs2 txtPath, 2  ' wdFormatText
        wordDoc.Close False
        Set wordDoc = Nothing
        wordApp.Quit False
        Set wordApp = Nothing
        fso.DeleteFile rtfPath

        ' retrieve plain text
        Set f = fso.OpenTextFile(txtPath)
        rtn = f.ReadAll
        f.Close
        Set f = Nothing
        fso.DeleteFile txtPath
        Set fso = Nothing
    End If
    RtfToPlainText = rtn
End Function

Then, if you had a table with two Memo fields - [rtfText] and [plainText] - you could extract the plain text into the second Memo field using the following query in Access:

UPDATE rtfTestTable SET plainText = RtfToPlainText([rtfText]);
查看更多
登录 后发表回答