The human genome is the genome of Homo sapiens. The human genome is stored on 23 chromosome pairs in the cell nucleus and in the small mitochondrial DNA. It is all of the approximately three billion base pairs of deoxyribonucleic acid (DNA) that make up the entire set of chromosomes of the human organism. A great deal is now known about the sequences of DNA that are on our chromosomes. They include both protein-coding DNA genes and noncoding DNA. What the DNA actually does is now partly known. Applying this knowledge in practice has only just begun. These are usually treated separately as the nuclear genome, and the mitochondrial genome. By 2003 the DNA sequence of the entire human genome was known.
The Human Genome Project (HGP) produced a reference sequence that is used worldwide in biology and medicine. (HGP) was the international, collaborative research program whose goal was the complete mapping and understanding of all the genes of human beings. There are 24 distinct human chromosomes: 22 autosomal chromosomes, plus the sex-determining X and Y chromosomes. Nature published the publicly funded project’s report, and Science published Celera’s paper. These papers described how the draft sequence was produced and gave an analysis of the sequence. Improved drafts were announced in 2003 and 2005, filling into ≈92% of the sequence.
The latest project ENCODE studies the way the genes are controlled.
DNA and proteins
The human genome is not uniform. The human genome contains just over 20,000 protein-coding genes, far fewer than had been expected. There are an estimated 20,000-25,000 human protein-coding genes. In fact, only about 1.5% of the genome codes for proteins, while the rest consists of non-coding RNA genes, regulatory sequences, and introns.
However, a single gene can produce a variety of proteins by means of RNA splicing. One particular Drosophila gene (DSCAM) can be alternatively spliced into 38,000 different mRNAs. Each mRNA codes for a different peptide chain. Therefore the number of proteins produced is far above the number of coding genes.
With RNA splicing and post-RNA translation changes, the total number of unique human proteins may be in the low millions.
The idea that most DNA is useless ‘junk’ is wrong. At least 80% of the genome has definite functions.