Dr Ilya Goldberg, NIA- NIH
Our lab develops pattern-recognition software (WND-CHARM) for image processing that is independent of modality or image content (Orlov et al., 2006). Instead of selecting appropriate algorithms for an image processing task, or specifying parameters for these algorithms, the inputs are limited to the association of images into pre-defined classes. These classes can be terms in an ontology, or defined ad-hoc for an image data set based on experimental conditions, or manual observations. A set of classes define a training set for a machine classifier that can then calculate the degree of similarity between any given image and the classes it was trained with (Shamir et al., 2010). Similarly, any pair of images can be quantitatively compared in the context of a trained classifier (Johnston et al., 2008). Several different classifiers can be defined for a given set of images, depending on the different ways that the images in the set can be grouped (Shamir et al., 2008). Intuitively, two images can be similar to each other or not depending on the context of the similarity – i.e. similar in what way? Thus classifiers can be used to define a context for measuring similarity between images in an objective and quantitative way without fine-tuning the analysis parameters, based solely on known associations of images into groups (Shamir et al., 2010). An appropriate measure of similarity forms the basis of image-based search, where the query to an image repository is an image rather than a text term. There are several implementations of image-based search, including one in PSLID from our collaborators in this project (Hu and Murphy, 2004), but there is usually no mechanism to specify a context for the search, or to have the search results reported along several contexts pre-defined in the database. The growth of public repositories based on OME technology (JCB DataViewer and ASCB CELL), provides standardized access to the annotations and metadata for a growing collection of diverse image types. Additionally, many of the groups involved in this project have their own well-annotated and curated image collections, that in several cases have been evaluated as training sets for machine classifiers (Boland and Murphy, 2001; Shamir et al., 2008). Finally well-annotated data are available from CCDB and from many other repositories, including those participating in this project. The mechanisms for accessing meta-data in these collections, especially those already contained within OMERO-based repositories, as well as the set of ontological terms defined by the OME Data Model provide a natural mechanism for defining training sets for machine classifiers. A collection of classifiers trained on this publicly available metadata can then be used to process image-based queries along several defined contexts of comparison. Similarly, text-based queries can be processed against not only text fields, but also against machine-assigned annotations based on image similarity to training classes.
The WND-CHARM classifier has been used to process images from a wide variety of imaging modalities without selecting algorithms or adjusting algorithm parameters (DIC, phase-contrast, fluorescence, histocytology, X-Rays, MRIs, etc.) (Shamir et al., 2009; Shamir et al., 2008). WND-CHARM has also been shown to report biologically meaningful quantitative image comparisons, allowing for interpolation in-between classes where this is appropriate, or reporting independently verifiable phenotypic similarities (Johnston et al., 2008). In this project, we will integrate WND-CHARM with OMERO so that images with defined metadata in OMERO can be used to define machine classifiers with little or no user input. Additional development will be required to find appropriate visualizations to report similarities along multiple contexts when performing image-based searches. Thus one output of this work will be a presentation of the diversity (or lack thereof) of existing repositories and domains of uniqueness and overlap. These outputs are important, especially during these early phases, as they provide unbiased, quantitative measures of repository growth and maturation and indicate domains that are under-represented.
This work will occur as the data in various on-line repositories grow. At this time, we cannot know whether existing repositories will be diverse or representative enough to reliably perform automated annotation of images with missing meta-data, but we will determine if this is possible in specific cases within the domains covered by the repositories (e.g., mitotic cells). This project is thus a step towards an ultimate goal, of automatic annotation of biological images.
Further information is available on the Goldberg Lab website
Ilya is in charge of the Image Informatics and Computational Biology Unit at the National Institute on Aging (part of NIH).
Harry Hochheiser was a post-doctoral researcher in the Goldberg Lab from 2003-2006. He is now an Assistant Professor of Computer Science at Towson University, where he continues to work on OME, Bioinformatics, Human-Computer Interaction, Information Visualization, and the social implications of computing technologies.
Josiah Johnston was part of Ilya's lab at the National Institute on Aging (part of NIH). He has now moved on to study for a PhD. He got a B.S. in Computer Science from the University of Arkansas, Fayetteville. Technical interests include working with ontologies, data modelling, and machine learning. Professionally, he is interested in developing technologies with positive social and economic implications on both local and global scales. OME technologies hold high promise of medical costs through computer assisted diagnosis, and reducing the price of medicines by making novel drug discovery cheap and easy. In his free time, he works towards peace, social justice, and community development.
Tom Macura is a Doctoral student in the Computer Laboratory at the University of Cambridge, United Kingdom as a member of Trinity College. His undergraduate degrees are in Mathematics and Computer Science, from the University of Maryland, Baltimore County. His professional interests are Image Analysis, Machine Learning, and Content Based Image Retrieval of biological images. His appointment at the National Institute on Aging, Laboratory of Genetics, Image Informatics and Computational Biology Unit that is headed by Dr. Ilya G. Goldberg is as a NIH-Cambridge Scholar.
Nikita Orlov is part of Ilya's lab at the National Institute on Aging (part of NIH).
Lior Shamir just recently graduated with Ph.D in Computer Science from Michigan Tech. He is interested in different forms and applications of computer vision, including biomedical image analysis, astronomical image analysis, face recognition, object recognition, and automated analysis of visual art. He enjoys spending time with his daughter.