Parse DICOM files in native Python

2019-02-03 02:54发布

问题:

What is the simplest and most-pythonic way to parse a DICOM file?

A native Python implementation without the use of non-Python libraries would be much preferred. DICOM is the standard file format in digital medical imaging (look here for more information).

There are some C/C++ libraries that support reading (a subset) of DICOM files. Two or three of them even have Python bindings. A native Python parser would serve two purposes for me:

  1. No need to build any external C/C++ libraries.
  2. Learn about the DICOM file format.

回答1:

And as of today there's another pure Python package reading DICOM files available: pydicom



回答2:

I'm using pydicom heavily these days, and it rocks.

It's pretty easy to start playing with it:

import dicom 
data = dicom.read_file("yourdicomfile.dcm")

To get the interesting stuff out of that "data" object, somehow resembling dcmdump output:

for key in data.dir():        
    value = getattr(data, key, '')
    if type(value) is dicom.UID.UID or key == "PixelData":
        continue

    print "%s: %s" % (key, value)

I think a great way to learn more about the dicom format is to open similar files and write code to compare them according to various aspects: study description, window width and center, pixel representation and so on.

Have fun! :)



回答3:

If you want to learn about the DICOM format, "Digital Imaging and Communications in Medicine (DICOM): A Practical Introduction and Survival Guide" by Oleg Pianykh is quite readable and gives a good introduction to key DICOM concepts. Springer-Verlag is the publisher of this book. The full DICOM standard is, of course, the ultimate reference although it is somewhat more intimidating. It is available from NEMA (http://medical.nema.org).

The file format is actually less esoteric than you might imagine and consists of a preamble followed by a sequence of data elements. The preamble contains the ASCII text "DICM" and several reserved bytes that are unused. Following the preamble is a sequence of data elements. Each data element consists of the size of the element, a two-character ASCII code indicating the value representation, a DICOM tag, and the value. Data elements in the file are ordered by their DICOM tag numbers. The image itself is just another data element with a size, value representation, etc.

Value representations specify exactly how to interpret the value. Is it a number? Is it a character string? If it's a character string, is it a short one or a long one and which characters are permitted? The value representation code tells you this.

A DICOM tag is a 4 byte hexadecimal code composed of a 2 byte "group" number and a 2 byte "element" number. The group number is an identifier that tells you what information entity the tag applies to (for example, group 0010 refers to the patient and group 0020 refers to the study). The element number identifies the interpretation of the value (items such as the patient's ID number, the series description, etc.). To find out how you should interpret the value, your code looks up the DICOM tag in a dictionary file.

There are some other details involved, but that's the essence of it. Probably the most instructive thing you can do to learn about the file format is to take an example DICOM file, look at it with a hex editor, and go through the process of parsing it mentally. I would advise against trying to learn about DICOM by looking at existing open source implementations, at least initially. It is more likely to confuse instead of enlighten. Getting the big picture is more important. Once you have the big picture, then you can descend into subtleties.



回答4:

The library pydicom mentioned above seems like a great library for accessing the DICOM data structures. To use it to access e.g. RT DOSE data, I guess one would do something like

import dicom,numpy
dose = dicom.ReadFile("RTDOSE.dcm")
d = numpy.fromstring(dose.PixelData,dtype=numpy.int16)
d = d.reshape((dose.NumberofFrames,dose.Columns,dose.Rows))

and then, if you're in mayavi,

from enthought.mayavi import mlab
mlab.pipeline.scalar_field(d)

This gives wrong coordinates and dose scaling, but the principle ought to be sound.

CT data should be very similar.



回答5:

Newer gdcm development now happen here:

http://gdcm.sourceforge.net/

It supports Java and C# on top of python.

Why write yet another dicom implementation when you can centralize a single C++ implementation and have it accessible to so many different languages



回答6:

Some years ago I was looking for the same thing and found this: Python DICOM lib

I wasn't too impressed with the code, but it is native Python reading DICOM files.



回答7:

DICOM is a real pain... even when the manufacturer sticks to the standards. If you write your own DICOM library you'll find different manufacturers DICOMs are effectively incompatible with other vendors [citation needed].

I tried (in my spare time) writing a C dicom parser borrowing heavily from a nice little Ruby parser I came across cunningly called 'ruby-dicom'. It is actually very readable code (I looked at one of the smaller earlier versions).

The biggest headache was trying to amass a library of the header tags with expected data types. There are the standard-defined tags, and the vendor tags. The ruby-dicom files contain a library of tags in a text format that can be easily inspected.

I gave up on the official literature as I was only interested in the file format which seems to only be in one of the 10 or so huge PDFs.

My local DICOM files are not compressed and follow standard easy to code bit-arrangements, but be prepared for various compressions and strange 12-bit images stored in 8-bit containers with big or little endianness and no padding bits...

I gave up once time became very scarce.

Python is probably a far better choice than C for this style of header parsing though...



回答8:

There are some libraries (most often implemented in C/C++) with Python bindings, e.g.:

  • pydicomlib
  • gdcmPython

However, I'm looking for a native Python implementation to learn more about the DICOM file format.



回答9:

I wonder what the original poster tried and which methods worked and not worked for him. I have never worked with DICOM, but a quick google search for "DICOM python" gave several interesting results. It seems that this project: http://www.creatis.univ-lyon1.fr/Public/Gdcm/ should deliver what you want. It has python bindings and a pretty active mailing list.