Quantcast
Channel: Topic Tag: xml | WordPress.org
Viewing all articles
Browse latest Browse all 3201

HarveyKane on "P and BR tags missing in XML export"

$
0
0

Hi everyone,

I'm working on a piece of software that reads Wordpress XML export files (containing posts and pages) and parses them.

I'm having trouble with a number of XML files that don't seem to have any P or BR tags to mark new lines in the content field. However the content includes other HTML tags such as UL and LI.

Example XML looks something like this...

<content:encoded><![CDATA[This is a paragraph.

Another paragraph.

<ul>
<li>Bullet list</li>
<li>Bullet list</li>
</ul>
&nbsp;]]>

Currently my script treats this as HTML content and I end up with all the content on one line. "This is a paragraph. Another paragraph."

However if I use the PHP nl2br() function to add in the missing line breaks then I end up with this...
<ul><br /><li>Bullet list</li><br /><li>Bullet list</li><br /></ul>

Does anyone have a method of parsing this pseudo-html code in the XML files to retain the line breaks? I notice on the original site it has the P tags in the correct place so something about the import must be stripping them. Unfortunately I'm not the person generating the export file so I have no control over this.

Has anyone come across this before or have any ideas?

Thanks in advance :)


Viewing all articles
Browse latest Browse all 3201

Trending Articles