A Radiologist’s Exploration of the Stanford ML Group’s MRNet data
This post reviews the recently released Stanford MRNet knee MRI data set and competition. As I am a senior radiology resident, I will focus on exploring the data through basic domain knowledge — addressing aspects of the data distribution that non-physicians may find perplexing. I’ll also include some Python code that interested parties may find useful in exploring the data set on their own.
The Stanford ML Group recently released their third public data set of medical imaging examinations, called MRNet, which can be found here. From the website:
The data set accompanies the publication of the Stanford ML Group’s work, which can be found here. Once again, they are hosting a competition to drive innovation in automated analysis of medical imaging.
Entries into the competition will be evaluated on a private test data set with the following metric:
You can find more details about the competition on the website. My goals for this post are as follows.
Magnetic resonance imaging (MRI) is a cross-sectional imaging modality, meaning that 2D images are acquired more-or-less sequentially in different imaging planes. The standard planes of imaging included in the MRNet data set are: axial, coronal and sagittal. MR images can be acquired in any plane, but that is beyond the scope of this post.
MR images are acquired by sequences of radiofrequency (RF) pulses (pulse sequences or — simply — sequences). Different sequences are designed to produce a signal that can be acquired and processed to reveal different patterns of signal intensity in biologic tissues. T1-weighting, T2-weighting, and proton density (PD)-weighting are the 3 core pulse sequence types used in musculoskeletal imaging.
It is helpful to know what the basic signal intensity patterns of fat, water, and muscle are for these 3 sequences (table below). Also helpful is the knowledge that cortical bone and fibrous structures (e.g. ligaments, menisci) should be dark on all sequences. Since fat and water are both intermediate-to-bright on T2 and PD sequences, fat saturation (fat-sat) is a technique that can be applied to enhance the appearance of water on these sequences.
Additional terminology:
Caveat: I have yet to review all of the data provided. These comments are based on a limited initial review of the data.
Again, the data can be obtained through the links above. You must first register an account with the Stanford ML Group and then you will be emailed a link to the data. Sharing the data is expressly forbidden, even with a teammate in the competition.
Once you’ve downloaded and unzipped the data, you should have this directory tree:
The *.csv
files contain the labels for the cases. The *.npy
files contained in the subdirectories of train
and valid
are NumPy arrays of dimension (slices, x, y)
. The x
and y
dimensions are consistently 256 x 256 across all exams with int
values ranging from 0 to 255. This implies that the pixel data has already been normalized by the Stanford ML Group.
Note: The image stack for each exam may contain different numbers of images and each exam may have a different number of slices for any given plane. This is completely normal for medical imaging data.
So as not to run afoul of the competition rules, I will refrain from posting summary data tables of the overlap in labels. However, it is pertinent to note the following:
The reasons for this are as follows:
Preview of the image data
Though the website states that the standard protocol for the MRI knee exams in the data set includes a variety of sequences, the data that I’ve reviewed thus far contains three sequences per case:
I’ll include an image here for a basic overview of knee anatomy. For those interested in learning more about the appearance of knee anatomy on MRI, this website (the source for the below image) might be helpful.
2. Anterior cruciate ligament (ACL) tear:
First, I’ll show a sagittal T2 fat-sat image of a normal ACL. Note: the ripples of increased signal in the image just above the ACL represent pulsation artifact from the popliteal vessels.
And now, a torn ACL. In the following image, the red arrow points to an oblong structure of relatively high signal intensity and the normal dark band of fibrous ligament (seen in the above image) is absent.
Again, at the time of this writing, I haven’t yet explored the data in its entirety. However, I have observed a few potentially “bad” data points that add to the challenge of this competition. I say potentially bad, because some of these may be handled fairly well by a deep learning model. Here, I show a couple of examples and give brief explanations for why these represent troublesome data points.
One issue I observed was a sagittal image stack where the majority of the knee is out of the FOV, resulting in an essentially useless stack of images for the competition tasks, as none of the relevant anatomic structures are visible. This may have been due to an error in image preprocessing for curation into the MRNet data set or, rather, due to an issue with the source data. Thus far, it is an infrequently encountered issue.
Another cause of potentially “bad” data is fairly typical for medical imaging. In a few cases, I’ve seen image artifact bad enough to make the images very challenging for a human to read. However, it’s possible that a deep learning algorithm could “read through” the artifact, just as we radiologists try to do in the interest of patient care. Here, the pulsation of the popliteal artery results in aliasing artifact throughout the image at the level of the knee joint.
The following code will load the data from one case into a dict
of NumPy arrays, which is then used by the KneePlot
class to generate the interactive plot shown in the *.gif
below.
I hope this post gives you a feel for the MRNet data set. Perhaps more importantly, I hope you’ve learned a little about knee MRI. Though I’ve yet to explore it in its entirety, I think this data set will be a valuable resource for the ML community. And I look forward to reading about the models developed through the competition. Thank you for reading!
A Radiologist’s Exploration of the Stanford ML Group’s MRNet data
Research & References of A Radiologist’s Exploration of the Stanford ML Group’s MRNet data|A&C Accounting And Tax Services
Source
0 Comments