What would be the best way to insert XML data (that I get from a webpage) into PostgreSQL database?
I'm using Java and need a little help finding a good way to read this data into the database.
相关问题
- Delete Messages from a Topic in Apache Kafka
- Jackson Deserialization not calling deserialize on
- Illegal to have multiple roots (start tag in epilo
- How to maintain order of key-value in DataFrame sa
- StackExchange API - Deserialize Date in JSON Respo
Postgres has (thanks to Daniel Lyons for pointing it out) native XML support which you can use to store your table. If you however want to shred your XML data manually, there are different possibilities to represent XML data in a database. The first question should be, if you want a very generic solution, that will be able to store any XML document or one that is specific to your domain (i.e. only allows XML documents of a certain structure). Depending on that, you will have a very flexible, universal representation which is however harder to query (the SQL needed will be quite complicated). If you have a more specific approach, the queries will be simpler, but you will need to create new tables or add new attributes to existing talbes every time you want to store another type of document or add a field to an existing document; so changing the schema will be harder (which is one major advantage of XML). This presentation should give you some ideas what are the different possibilities.
Also, you might consider to switch to some DB that supports Xquery, like DB2. The ability to natively query using XQuery, a language targeted at processing XML, will simplify things a lot.
UPDATE: Given your comment, your XML data (that you linked to) is perfectly relational. It can be mapped 1:1 to the following table:
So any
mynt
tag would be a record in the table and the corresponding sub-tags the attributes. The data types I gathered from your data, they might be wrong. The main problem is, IMO, that there is no natural primary key, so I added an autogenerated one.I have a working implementation where I do everything inside PostgreSQL without additional libraries.
Auxiliary parsing function
Handle multiple values
The above implementation doesn't handle multiple attributes at one xpath. Here is an overloaded version of
f_xml_extract_val()
for that. With the 3rd parameter you can pickone
(the first),all
ordist
(distinct) values. Multiple values are aggregated to a comma-separated string.Call:
Main part
Name of target table:
tbl
; prim. key:id
:Important points:
This implementation checks on a primary key if the inserted row exists already and updates in this case. Only new rows are inserted.
I use a temporary staging table to speed up the procedure.
pg_read_file()
has restrictions to it. I quote the manual:And:
So you have to put your source file there - or create a symbolic link to your actual file/directory.
Or you can provide the file via Java in your case (I did it all inside Postgres).
Or you can import the data into 1 column of 1 row of a temporary table and take it from there.
Or you can use
lo_import
like demonstrated in this related answer on dba.SE.Tested with Postgres 8.4, 9.0 and 9.1.
XML has to be well-formed.
This blog post by Scott Bailey helped me.
PostgreSQL has an XML datatype. There are lots of XML specific functions you can use to query and modify the data, such as with xpath.
From the Java side, you can pretend you're just working with strings, but know that the data is well-formed on the way out and it won't let you store non-well-formed data.