More in this section

Bioscience in brief

Genomic selection

Why is it important to increase the rate of developing improved varieties

The ability to continuously develop improved crop varieties is central to increasing food production and mitigating the effects of climate change.

In order to meet future food demands by the forecast increase in global population, the production of staple cereals especially maize, wheat and rice must be doubled in the next thirty years.

A number of new breeding technologies, including Genomic Selection, are taking advantage of the large amount of genomic data available for many crops and the continuing reduction in the price of large scale, high-throughput DNA extraction and sequencing.

Back to basics: breeding for better crop varieties

The main aim of any plant improvement programme is to introduce beneficial genetic variation such as resistance to a pest, disease or drought and improved yield and nutritional qualities into so-called “elite” crop varieties.

In its most simple form a plant with the desired new characteristic or characteristics is crossed with the elite variety or varieties to be improved. The progeny is then screened for plants containing the new characteristic but also carrying all the valued characteristics of the parent elite variety. Several generations of crossing back to the elite parent are performed to preserve its good attributes in the new variety.

The simplest and most traditional way of selecting an improved crop is by the way it looks or responds to the environment: i.e. by its phenotype. That in many (or even most) cases is not simple and the process can cost a lot of money and take a very long time.

For example, some of the beneficial characteristics may not be always obvious, such as resistance to a pest or disease in the absence of the pest or disease, or drought resistance in a year with good rainfall. Or the characteristics may take a long time to show. Trees, for example, fruit only after many years, so screening to select for specific fruit qualities requires a very long wait and a lot of space, labour and resources. And crucially, since many beneficial qualities are required in conjunction, the breeder must take care of not losing important characteristics while selecting for new ones while maintaining the high yield potentials of existing varieties. This is especially challenging for characteristics that are controlled by a large number of genes as many are.

Ultimately it is a numbers’ game: how can you increase the effectiveness of the breeding programme (number of improved varieties produced) in the fastest and cheapest way?

How can genetic markers help?

All the individuals in a species have the same set of genes but individual genes exist in different forms (with slight differences in their DNA sequence) called alleles. In sexually reproducing species (all the seed crops and of course also humans and other animals) an individual inherits one allele from their mother and one of their father so, effectively, alleles come in pairs.

Different alleles have different effects on the individual characteristics and the aim of crop breeding programmes is to bring together as many beneficial alleles as possible.

Genetic markers are used to determine the presence of distinct alleles in individuals by detecting differences in their DNA sequence. This is called marker-assisted selection or MAS for short.

In the most simple example, a genetic marker associated with a dominant allele for a characteristic controlled by a single gene will always be associated with a given characteristic. In the example of yellow colour for maize, the Y allele leads to the accumulation of carotenoids in the seed and results in yellow kernels while two copies of the y allele result in white kernels (no carotenoids are produced). The Y marker will have the maximum correlation score of 1.0 because all the plants that contain at least a copy of the Y allele will have yellow kernels. The Y marker will have the maximum breeding score of 1.0, because all the plants that contain at least a copy of the Y allele will have yellow kernels. This means that the correlation between the marker and the colour of the seed is 100%.

The breeder would therefore have two options for selecting plants with white kernels in populations with both Y/y alleles: 1) wait until the plants produce seeds and look at the colour, or 2) test for the absence of the Y marker. In the first case it would require several months wait and significant space and resources while in the latter the plants could be tested a few days after germination while still at the nursery. Unwanted plants could then simply be removed. The number of plants tested could be increased significantly for the same amount of time and resources used.

It is mostly not so simple

The situation is usually more complex because many characteristics (even most) are controlled by many genes each of which perhaps with only a small effect on the overall phenotype. And some of these genes may have opposing effects and interact with each other in ways that are sometimes poorly understood.

Another complicating factor is that the environment often plays a big role in specific phenotypes. This makes the phenotype in question less inheritable – meaning that the progeny will only show the same phenotype as the parents if the environment is the same. And there are also complicated genetic interactions such the effect of one allele changing depending on the presence of other alleles or related individuals behaving in a distinct way in comparison with the rest of the population. So determining the relationship between sets of markers and beneficial agronomic characteristics becomes quite tricky.

What is Genomic Selection (GS)?

Genomic selection (GS) is a new breeding method used in crops and livestock, and is an extension of marker-assisted selection (MAS).

GS combines marker data with phenotypic data to predict breeding or genetic values using statistical methods in a breeding population.

The key difference between MAS and GS is that selection in GS is based on a on the breeding value of individuals as indicated by all available, genome-wide marker data, as opposed to being guided by part of a larger group of genetic markers, or subset, as in MAS.

This is how it works:

• A ‘reference ‘or ‘training population’ is genotyped for all the available genetic markers and it is also phenotyped

The correlation of the presence of specific genetic markers with desirable characteristics in the crop is determined

This data is then used to develop a GS breeding model. The models compute all the scores for all the available markers to work out the best combinations to select for breeding superior individuals. As the data sets are very large and the correlations between markers and phenotypes can be variable (for example, due to changes in the environment) and are not always fully understood, the use of statistical tools to develop the GS model is essential. In a recent experiment to determine the suitability of GS in rice over 73,000 genetic markers were scored in a training population of 316 plants, while breeding populations can exceed 20,000 individual plants.

• The GS breeding model is then validated on a subset of the training population to determine how well it predicts desirable phenotypes.

• The DNA of the breeding population is subsequently sequenced (but not phenotyped) to obtain genetic marker information. The GS breeding model is used to estimate the breeding value of individuals. High breeding values are correlated with the presence of ‘good’ genetic markers. It is important that the training and breeding populations are as related as possible and therefore the design of the former is very important

• Promising individuals can be selected as breeding parents and unwanted individuals can be removed from the breeding population. This alone represents a very big advantage of GS since it saves time. Traditionally candidate plants were screened before selecting them which could mean skipping one planting season. It can also increase the accuracy of predicting good lines.

A (left) Training/reference population: phenotyped and genotyped. Used to develop genetic breeding values and GS model

B (right) Breeding population: genotyped only. Marker information used to answer which individuals are the best parents?

Hence in GS selection is separated from phenotyping which is important because phenotyping is much more expensive and time consuming than genotyping.

So, GS is a kind of “educated guess” and greatly reduces the time for each generation in breeding programmes and therefore makes developing new crop varieties faster and cheaper. GS really is a data mining exercise. It requires significant investments together with the ability to generate, manage and analyse very large datasets.

The key tool in GS is statistics. Different breeding models need to be developed and tested with different statistical and computer modelling tools.