This README file describes the Dataset "Training and Validation Data for Automated Head versus Tail Classification and Cell Identification in C. elegans," located at http://hdl.handle.net/1853/53154. For questions about this dataset, please contact Hang Lu, Georgia Tech, at hang.lu@gatech.edu or 404-894-8473. The data contained in this dataset were collected at the Georgia Institute of Technology, in Atlanta Georgia, from 2012-2013. This data collection contains images used to train and validate a bright field head versus tail classifier and fluorescent cell pair detection classifiers for automatic image processing in C. elegans studies. The collection consists of three zip files: HeadvTailTrainingData.zip contains bright field images and annotation data for training of the head versus tail classifier HeadvTailTestingData.zip contains bright field images and annotation data for validation of the head versus tail classifier CellIDTrainingandTestingData.zip containts fluorescent images and annotation data for training and validation of the single cell pair and multi-cell pair identification tools. Additional information about specific files within these folders is provided below. **************************************************************************************** Within HeadvTailTrainingData.zip: This folder contains images and annotations used for the training of the head versus tail classifier. Each folder contains a separate imaging trial with multiple images. The naming convention for the folder is as follows: HTTrain-(Strain Name)(Age)(Food Condition)(Binning)x(Number) HeadvTailTrainingData.zip contains 14 items -- 1 READMETraining.txt file and 13 folders. Within each folder, there are three distinct types of files: - Files ending with xMP(number).tiff are minimum projections of a stack of 3 brightfield images taken at 15 micron z intervals. Each folder contains 190 of these files. - Files ending with xBWAnnotated(number).tiff are images showing the annotated binarized images after niblack segmentation. All binary particles that are candidates for the grinder particle going into layer 1 of classification are represented by image regions with nonzero pixel intensities. Particles manually identified as the grinder are labeled with a pixel intensity of 255. All other particles have a pixel intensity of 51. Each folder contains 190 of these files. - A MATLAB data file ending with xAnnotationData.mat contains the corresponding ground truth labels for all of the particles in the annotation images. The data file contains two variables. The variable "labels" is a two column matrix where the first column identifies the image and the second column identifies the particle number (enumerated from left to right). The variable "partID" is a two column matrix where the first column identifies all manually identified grinder particles with a 1. The second column is 1 when an image is suitable for analysis and 0 if it is excluded from the analysis due to imaging artifacts or an improperly loaded worm. All image files show the worm oriented with the anterior-posterior axis oriented vertically. All images are taken with a 40X oil objective on an Olympus IX73 microscope with a Hamamatsu Flash 4.0 camera. Folder N2Day1FedBin1x1 contains 1 MATLAB file, 190 annotated images, and 190 brightfield images. Folder N2Day1FedBin2x1 contains 1 MATLAB file, 97 annotated images, and 97 brightfield images. Folder N2Day2FedBin2x1 contans 1 MATLAB file, 102 annotated images, and 102 brightfield images. Folder N2Day2FedBin2x2 contains 1 MATLAB file, 119 annotated images, and 119 brightfield images. Folder N2Day2StarvedBin2x1 contains 1 MATLAB file, 97 annotated images, and 97 brightfield images. Folder N2Day2StarvedBin2x2 contains 1 MATLAB file, 94 annotated images, and 94 brightfield images. Folder N2Day2StarvedBin2x3 contains 1 MATLAB file, 205 annotated images, and 205 brightfield images. Folder N2Day3FedBin2x1 contains 1 MATLAB file, 60 annotated images, and 60 brightfield images. Folder N2Day3FedBin2x2 contains 1 MATLAB file, 101 annotated images, and 101 brightfield images. Folder N2Day3FedBin2x3 contains 1 MATLAB file, 165 annotated images, and 165 brightfield images. Folder N2Day3StarvedBin2x1 contains 1 MATLAB file, 47 annotated images, and 47 brightfield images. Folder N2Day3StarvedBin2x2 contains 1 MATLAB file, 65 annotated images, and 65 brightfield images. Folder N2Day3StarvedBin2x3 contains 1 MATLAB file, 129 annotated images, and 129 brightfield images. **************************************************************************************** Within HeadvTailTestingData.zip: This folder contains images and annotations used for the testing of the head versus tail classifier. Each folder contains a separate imaging trial with multiple images. The naming convention for the folder is as follows: HTTest-(Strain Name)(Age)(Food Condition)(Binning)x(Number) HeadvTailTestingData.zip contains 17 items -- 1 READMETesting.txt file and 16 folders. Within each folder, there are two distinct types of files: - Files ending with xMP(number).tiff are minimum projections of a stack of 3 brightfield images taken at 15 micron z intervals. - A MATLAB data file ending with xHTID.mat contains the corresponding ground truth labels for all of the particles in the annotation images. The data file contains two variables. The variable "labels" is a vector identifying each of the images in the folder. The variable "HTID" is a vector that identifies the images specified in "labels" with either a 1 for heads or 0 for tails. Images labeled with a 2 suffer from motion artifacts or poor worm loading and are excluded from analysis. All image files show the worm oriented with the anterior-posterior axis oriented vertically. All images are taken with a 40X oil objective on an Olympus IX73 microscope with a Hamamatsu Flash 4.0 camera. Folder N2Day1FedBin1x1 contains 1 MATLAB file and 146 brightfield images. Folder N2Day1FedBin2x1 contains 1 MATLAB file and 123 brightfield images. Folder N2Day2FedBin2x1 contains 1 MATLAB file and 198 brightfield images. Folder N2Day2FedBin2x2 contains 1 MATLAB file and 101 brightfield images. Folder N2Day2FedBin2x3 contains 1 MATLAB file and 225 brightfield images. Folder N2Day2StarvedBin2x1 contains 1 MATLAB file and 112 brightfield images. Folder N2Day2StarvedBin2x2 contains 1 MATLAB file and 11 brightfield images. Folder N2Day2StarvedBin2x3 contains 1 MATLAB file and 80 brightfield images. Folder N2Day3FedBin2x1 contains 1 MATLAB file and 207 brightfield images. Folder N2Day3FedBin2x2 contains 1 MATLAB file and 218 brightfield images. Folder N2Day3FedBin2x3 contains 1 MATLAB file and 144 brightfield images. Folder N2Day3StarvedBin2x1 contains 1 MATLAB file and 162 brightfield images. Folder N2Day3StarvedBin2x2 contains 1 MATLAB file and 116 brightfield images. Folder N2Day3StarvedBin2x3 contains 1 MATLAB file and 71 brightfield images. Folder QH3833Day2FedBin2x1 contains 1 MATLAB file and 284 brightfield images. Folder QH3833Day2FedBin2x2 contains 1 MATLAB file and 246 brightfield images. **************************************************************************************** Within CellIDTrainingandTestingData.zip: This folder contains images and annotations used for the testing and training of the cell identification classifiers. Each folder contains a separate imaging trial with multiple images. The naming convention for the folder is as follows: CellID(OnePair/TwoPair)(Test/Train)-(Number) CellIDTraningandTestingData.zip contains 24 items -- 1 READMECellID.txt and 23 folders. Folders labeled with the OnePair designation signfies data sets used for training and testing of the classifier for the identification of one cell pair. These folders contain fluorescent imaging data collected from the worm strain QL296 drcSi89[pdaf-7::GFP; unc-119(+)]. Folders labeled wtih the TwoPair designation signifes data sets used for the training and testing of the classifier for the identification of two cell pairs. These folders contain fluorescent imaging data collected from the worm strain QL617 drcSi68[unc-119(+); Pins-6::mCherry]II; gjIs140[dpy-20(+); gpa-4::GFP]. The Test or Train designations in the folder name signfifies whether the data set was used for training or testing the classifer. Within each folder, there are two distinct types of files: - Files ending with xMP(Red/Green)(number).tiff are maximum projections of fluorescent z-stacks collected with an mCherry filterset (Red) or GFP filterset (Green). For the two pair cell identification problem, the red channel shows all four neurons (ASI and ASJ) and the green channel shows only the ASI neurons. The green channel is used to determine the true identity of the cells in testing and validation. - Files ending with xBWAnnotated(number).tiff are images showing the annotated binarized images after niblack segmentation. All binary particles that are candidates for cells going into layer 1 of classification are represented by image regions with nonzero pixel intensities. Particles manually identified as potential cells are labeled with a pixel intensity of 255. All other particles have a pixel intensity of 51. - A MATLAB data file ending with xAnnotationData.mat contains the corresponding ground truth labels for all of the particles in the annotation images. The data file contains two variables. The variable "labels" is a two column matrix where the first column identifies the image and the second column identifies the particle number (enumerated from left to right). The variable "partID" is a two column matrix where the first column identifies all manually identified cell particles with a nonzero number (1 for ASI and 2 for ASJ). The second column specifies whether the image is suitable for analysis. For one pair detection, images labeled with a 1 in this column are used for training and testing. Images labeled with a 2 are only used for the training of the layer 1 classifier due to absent cells or misorientation of the worm. For two pair cell detection, images labeled with a 1 are used for training and testing. Images labeled with a zero are not used due to absent cells or misorientation of the worm. All image files show the worm oriented with the anterior-posterior axis oriented vertically (anterior down). QL296 data for the one pair application were taken with 40X oil objective on a Leica DMI6000B microscope with a Hamamatsu Orca D2 camera. QL617 data for the two pair application were taken with a 40X oil objective on an Olympus IX73 microscope with a Hamamatsu Flash 4.0 camera. Folder CellIDOnePairTest-1 contains 1 MATLAB file, 91 annotated images, and 91 fluorescent images. Folder CellIDOnePairTest-2 contains 1 MATLAB file, 65 annotated images, and 65 fluorescent images. Folder CellIDOnePairTest-3 contains 1 MATLAB file, 72 annotated images, and 72 fluorescent images. Folder CellIDOnePairTest-4 contains 1 MATLAB file, 67 annotated images, and 67 fluorescent images. Folder CellIDOnePairTest-5 contains 1 MATLAB file, 63 annotated images, and 63 fluorescent images. Folder CellIDOnePairTest-6 contains 1 MATLAB file, 65 annotated images, and 65 fluorescent images. Folder CellIDOnePairTest-7 contains 1 MATLAB file, 72 annotated images, and 72 fluorescent images. Folder CellIDOnePairTest-8 contains 1 MATLAB file, 67 annotated images, and 67 fluorescent images. Folder CellIDOnePairTest-9 contains 1 MATLAB file, 63 annotated images, and 63 fluorescent images. Folder CellIDOnePairTrain-1 contains 1 MATLAB file, 67 annotated images, and 67 fluorescent images. Folder CellIDOnePairTrain-2 contains 1 MATLAB file, 82 annotated images, and 82 fluorescent images. Folder CellIDOnePairTrain-3 contains 1 MATLAB file, 16 annotated images, and 16 fluorescent images. Folder CellIDOnePairTrain-4 contains 1 MATLAB file, 19 annotated images, and 19 fluorescent images. Folder CellIDOnePairTrain-5 contains 1 MATLAB file, 22 annotated images, and 22 fluorescent images. Folder CellIDOnePairTrain-6 contains 1 MATLAB file, 7 annotated iamges, and 7 fluorescent images. Folder CellIDOnePairTrain-7 contains 1 MATLAB file, 5 annotated images, and 5 fluorescent images. Folder CellIDTwoPairTest-1 contains 1 MATLAB file, 61 annotated images, 61 mCherry filterset fluorescent images, and 61 GFP filterset fluorescent images. Folder CellIDTwoPairTrain-1 contains 1 MATLAB file, 66 annotated images, 66 mCherry filterset fluorescent images, and 66 GFP filterset fluorescent images. Folder CellIDTwoPairTrain-2 contains 1 MATLAB file, 52 annotated images, 52 mCherry filterset fluorescent images, and 52 GFP filterset fluorescent images. Folder CellIDTwoPairTrain-3 contains 1 MATLAB file, 59 annotated images, 59 mCherry filterset fluorescent images, and 59 GFP filterset fluorescent images. Folder CellIDTwoPairTrain-4 contains 1 MATLAB file, 77 annotated images, 77 mCherry filterset fluorescent images, and 77 GFP filterset fluorescent images. Folder CellIDTwoPairTrain-5 contains 1 MATLAB file, 20 annotated images, 20 mCherry filterset fluorescent images, and 20 GFP filterset fluorescent images. Folder CellIDTwoPairTrain-6 contains 1 MATLAB file, 48 annotated images, 48 mCherry filterset fluorescent images, and 48 GFP filterset fluorescent images. Change Log -------------------- 2015-01-16 -- high level folder structured added to README by Lizzy Rolando, data curator.