PHPWord read word documents preserving the styles

2020-07-24 05:50发布

问题:

I have successfully extracted the text content in a word file using phpWord and the following code:

<?php
require_once 'vendor/autoload.php';

// Read contents
$name = 'linux';
$source = "{$name}.docx";
echo date('H:i:s'), " Reading contents from {$source} <hr>";
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source);

$sections = $phpWord->getSections();
foreach ($sections as $key => $value) {
    $sectionElement = $value->getElements();
    foreach ($sectionElement as $elementKey => $elementValue) {
        if ($elementValue instanceof \PhpOffice\PhpWord\Element\TextRun) {
            $secondSectionElement = $elementValue->getElements();
            foreach ($secondSectionElement as $secondSectionElementKey => $secondSectionElementValue) {
                if ($secondSectionElementValue instanceof \PhpOffice\PhpWord\Element\Text) {
                    echo $secondSectionElementValue->getText();
                    echo "<br>";
                }
            }
        }
    }
}

This displays plain text. Also I tried pure php "zip" functions to fetch the text using the following code which also gives me the plain text:

function read_docx($filename){

    $striped_content = '';
    $content = '';

    if(!$filename || !file_exists($filename)) return false;

    $zip = zip_open($filename);
    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

        zip_entry_close($zip_entry);
    }
    zip_close($zip);      
    //$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    //$content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;

I want to extract the word contents and display them in ckeditor/tinymce with preserved styles, as if I copy from word document into ckeditor directly it preserves the styles, I want similar feature but in code. I also want the same function for the following file formats too:

1) LibreOffice/OpenOffice (.odt, .sxw)
2) Microsoft Works (.wps)
3) WinWord, WordPad (.rtf)
4) WordPerfect (.wp, .wpd)
5) HTML (.htm, .html)

What is the best way to do this in PHP ? Are there libraries which can achieve this ?

This is the output I am getting from the zip function code:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14"><w:body><w:p w14:paraId="676FD033" w14:textId="77777777" w:rsidR="00E43633" w:rsidRPr="007E5F00" w:rsidRDefault="007E5F00" w:rsidP="007E5F00"><w:pPr><w:jc w:val="center"/><w:rPr><w:b/><w:sz w:val="28"/></w:rPr></w:pPr><w:r w:rsidRPr="007E5F00"><w:rPr><w:b/><w:sz w:val="28"/></w:rPr><w:t>Introduction to Linux</w:t></w:r></w:p><w:p w14:paraId="47931EDA" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRDefault="007E5F00" w:rsidP="007E5F00"><w:pPr><w:rPr><w:b/></w:rPr></w:pPr><w:r w:rsidRPr="007E5F00"><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">Linux </w:t></w:r><w:r><w:rPr><w:b/></w:rPr><w:t>commands:</w:t></w:r></w:p><w:p w14:paraId="7213615E" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRPr="007E5F00" w:rsidRDefault="007E5F00" w:rsidP="007E5F00"><w:proofErr w:type="spellStart"/><w:r><w:rPr><w:b/></w:rPr><w:t>mkdir</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> - </w:t></w:r><w:r w:rsidRPr="007E5F00"><w:t>This command is used to create a new directory.</w:t></w:r></w:p><w:p w14:paraId="6C8B3A9F" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRDefault="007E5F00"><w:r><w:t xml:space="preserve">Syntax: </w:t></w:r><w:proofErr w:type="spellStart"/><w:r><w:t>mkdir</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:t xml:space="preserve"> [directory name]</w:t></w:r></w:p><w:p w14:paraId="3454E01F" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRDefault="007E5F00"><w:pPr><w:rPr><w:b/></w:rPr></w:pPr><w:proofErr w:type="spellStart"/><w:r><w:rPr><w:b/></w:rPr><w:t>rmdir</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> – This </w:t></w:r><w:proofErr w:type="spellStart"/><w:r><w:rPr><w:b/></w:rPr><w:t>commnd</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> is used to remove a directory. </w:t></w:r></w:p><w:p w14:paraId="3726D091" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRPr="00E07056" w:rsidRDefault="007E5F00"><w:r w:rsidRPr="00E07056"><w:t xml:space="preserve">Syntax: </w:t></w:r><w:proofErr w:type="spellStart"/><w:r w:rsidRPr="00E07056"><w:t>rmdir</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r w:rsidRPr="00E07056"><w:t xml:space="preserve"> [directory name]</w:t></w:r></w:p><w:p w14:paraId="671C1334" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRDefault="00644552"><w:pPr><w:rPr><w:b/></w:rPr></w:pPr><w:r><w:rPr><w:b/></w:rPr><w:t>c</w:t></w:r><w:r w:rsidR="007E5F00"><w:rPr><w:b/></w:rPr><w:t>at – This command is used to create a file.</w:t></w:r><w:r w:rsidR="007E5F00"><w:rPr><w:b/></w:rPr><w:br/></w:r><w:r w:rsidR="007E5F00"><w:rPr><w:b/></w:rPr><w:br/></w:r><w:r w:rsidR="007E5F00" w:rsidRPr="00E07056"><w:t>Syntax: cat &gt; [file name]</w:t></w:r></w:p><w:p w14:paraId="3B19AB80" w14:textId="781534E0" w:rsidR="007E5F00" w:rsidRDefault="007E5F00"><w:pPr><w:rPr><w:b/></w:rPr></w:pPr><w:r><w:rPr><w:b/></w:rPr><w:t>rm – This command is used to delete a file.</w:t></w:r></w:p><w:p w14:paraId="62D51FC5" w14:textId="77777777" w:rsidR="007E5F00" w:rsidRPr="00E07056" w:rsidRDefault="007E5F00"><w:r w:rsidRPr="00E07056"><w:t>Syntax: rm [file name]</w:t></w:r></w:p><w:p w14:paraId="2530B357" w14:textId="77777777" w:rsidR="00644552" w:rsidRDefault="00644552"><w:pPr><w:rPr><w:b/></w:rPr></w:pPr><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">ls – This command is used to show the list of files. </w:t></w:r></w:p><w:p w14:paraId="34187575" w14:textId="20F3787A" w:rsidR="007E5F00" w:rsidRDefault="00644552"><w:r w:rsidRPr="00E07056"><w:t>Syntax: ls</w:t></w:r></w:p><w:p w14:paraId="3761997E" w14:textId="50B84C05" w:rsidR="005F3884" w:rsidRDefault="005F3884"><w:proofErr w:type="spellStart"/><w:r><w:rPr><w:b/></w:rPr><w:t>nano</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> – </w:t></w:r><w:r><w:t>This command is used to edit a file.</w:t></w:r></w:p><w:p w14:paraId="494935E1" w14:textId="59787D6D" w:rsidR="005F3884" w:rsidRPr="005F3884" w:rsidRDefault="005F3884"><w:r><w:t xml:space="preserve">Syntax: </w:t></w:r><w:proofErr w:type="spellStart"/><w:r><w:t>nano</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r><w:t xml:space="preserve"> [file name]</w:t></w:r></w:p><w:p w14:paraId="5E9230F7" w14:textId="1AFCF21C" w:rsidR="005F3884" w:rsidRDefault="005F3884"><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">cd – </w:t></w:r><w:r><w:t xml:space="preserve">This command is used to change the </w:t></w:r><w:r w:rsidR="008D2F14"><w:t>current working directory.</w:t></w:r></w:p><w:p w14:paraId="74E0E24D" w14:textId="0434C3B6" w:rsidR="00584224" w:rsidRPr="005F3884" w:rsidRDefault="00584224"><w:r><w:t xml:space="preserve">Syntax: </w:t></w:r><w:r><w:t>cd</w:t></w:r><w:r><w:t xml:space="preserve"> [</w:t></w:r><w:r><w:t>directory name</w:t></w:r><w:r><w:t>]</w:t></w:r></w:p><w:p w14:paraId="44838DE3" w14:textId="00375B7C" w:rsidR="005F3884" w:rsidRDefault="00584224"><w:proofErr w:type="spellStart"/><w:r><w:rPr><w:b/></w:rPr><w:t>p</w:t></w:r><w:r w:rsidR="005F3884"><w:rPr><w:b/></w:rPr><w:t>wd</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r w:rsidR="005F3884"><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> present working directory</w:t></w:r><w:r w:rsidR="008D2F14"><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> – </w:t></w:r><w:r w:rsidR="008D2F14"><w:t>This command is used to display the present working directory.</w:t></w:r></w:p><w:p w14:paraId="270B9CFF" w14:textId="0D62DC9E" w:rsidR="00584224" w:rsidRPr="008D2F14" w:rsidRDefault="00584224"><w:r><w:t xml:space="preserve">Syntax: </w:t></w:r><w:proofErr w:type="spellStart"/><w:r><w:t>pwd</w:t></w:r><w:proofErr w:type="spellEnd"/></w:p><w:p w14:paraId="67F33FB5" w14:textId="2680AAEB" w:rsidR="005F3884" w:rsidRDefault="008D2F14"><w:r><w:rPr><w:b/></w:rPr><w:t>c</w:t></w:r><w:r w:rsidR="005F3884"><w:rPr><w:b/></w:rPr><w:t>d</w:t></w:r><w:proofErr w:type="gramStart"/><w:r w:rsidR="005F3884"><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> ..</w:t></w:r><w:proofErr w:type="gramEnd"/><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> – </w:t></w:r><w:r><w:t>This command is used to move a directory back.</w:t></w:r></w:p><w:p w14:paraId="3129D383" w14:textId="53F5569B" w:rsidR="00376F17" w:rsidRPr="00376F17" w:rsidRDefault="00376F17"><w:r><w:t>Syntax: cd ..</w:t></w:r><w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/></w:p><w:sectPr w:rsidR="00376F17" w:rsidRPr="00376F17" w:rsidSect="004F205D"><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/><w:cols w:space="720"/><w:docGrid w:linePitch="360"/></w:sectPr></w:body></w:document>

I just want the underline, bold, italic, and text centering formatting.