Integrated transcriptome map highlights structural and functional aspects of the normal human heart. Comparison with previous reports reveals substantial change in the number of known nuclear protein-coding genes (now 19,116), the protein-coding non-redundant transcriptome space [now 59,281,518 base pair (bp), 10.1% increase], the number of exons (now 562,164, 36.2% increase) due to a relevant increase of the RNA isoforms recorded. The unfolding of these instructions is initiated by the transcription of the DNA into RNA sequences. Google Scholar. doi: 10.1093/dnares/dsv028. Brief Bioinform. We first performed a protein-centric transcriptomics scan to define a revised set of human secreted proteins (secretome) based on 19,670 protein-coding genes predicted by Ensembl ().For each protein-coding gene, all protein isoforms (splice variants) were annotated on the basis of the presence of a signal peptide, transmembrane regions, or both, and each protein isoform was classified as being . Protein-coding genes: 646 to 719 It contains 133 million base pairs of nucleotides, or over 4% of the total. Maddon, P. J. et al. Now, let's filter to get only protein-coding genes, group by the ensembl gene ID, summarize to count how many transcripts are in each gene, inner join that result back to the original gene list, so we can select out only the gene, number of transcripts, symbol, and description, mutate the description column so that it isn't so wide that it'll break the display, arrange the returned data . Based on the transcriptomics profiles, cell lines were evaluated for their consistency to the corresponding TCGA (The Cancer Genome Atlas) disease cohort to help researchers to select the best cell lines as in vitro models for cancer research. The read counts of the 1055 cell lines were normalized by DESeq2 with respect to the size factor of each cell line and were further transformed by variance stabilizing transformation into log2 space. To test this, for the 27 cell line cancer types, gene expression was averaged per disease, resulting in the mean expression for each of the 27 cell line cancer types. J Cell Physiol. Chromosome 10, which makes up almost 4.5% of our DNA, is almost identical to chromosome 10 found in gorilla, orangutan and chimps. Below is a list of articles on human chromosomes, each of which contains an incomplete list of genes located on that chromosome. 2008;3:20. Identification of minimal eukaryotic introns through GeneBase, a user-friendly tool for parsing the NCBI Gene databank. Pseudogenes: 606 to 879. Kapustin Y, Souvorov A, Tatusova T, Lipman D. Splign: algorithms for computing spliced alignments with identification of paralogs. EXON NUMBER IN PROTEIN-CODING GENES Average number of exons in one gene Largest number in one gene Smallest number in one gene EXON SIZE IN PROTEIN-CODING GENES 16.6 kb Science. Following the opening of the data sets in a spreadsheet application, users have easy access to the whole set of current reviewed/validated data about human nuclear protein-coding genes. Using GeneBase, a software with a graphical interface able to import and elaborate National Center for Biotechnology Information (NCBI) Gene database entries, we provide tabulated spreadsheets updated to 2019 about human nuclear protein-coding gene data set ready to be used for any type of analysis about genes, transcripts and gene organization. Most of the sequences in the human genome do not code for proteins but generate thousands of non-coding RNAs (ncRNAs) with regulatory functions. Estimates of the current updates are closer to 20,000 protein-coding genes, as well as an expanding number of functional, non-coding RNA sequences. Scientists have since come. Finally the two ranking lists were combined, and cell lines were reordered according to their average rank. The UCSC genome browser database: 2019 update. Researchers often turn to model organisms to understand the complex molecular mechanisms of the human body. This small chromosome (less than 2.5%), measuring only 19 by 59 megabases in size, is pretty low key. A number of 2685 genes are classified as brain elevated and 202 genes were only detected in the brain. In order to provide reliable data, we focused on a curated subset of human nuclear protein-coding genes with a REVIEWED or VALIDATED Reference Sequence (RefSeq) status [1, 7]. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets. These data might also be used in comparative genomic studies when compared to similar data sets generated from different species to uncover specific and significant differences in genome and gene organization. p-arm Partial list of the genes located on p-arm (short arm) of human chromosome 3: . In other words, chromosome 14 usually determines how attractive a person can be. Provided by the Springer Nature SharedIt content-sharing initiative, Nature (Nature) The following is a partial list of genes on human chromosome 3. 2004. -. Pseudogenes: 381 to 400. If you continue, we'll assume that you are happy to receive all cookies. Journal of Translational Medicine Protein-coding genes: 795 to 912 Protein-coding genes: 739 to 822 Non-coding RNA genes: 246 to 830 Pseudogenes: 590 to 738 Chromosome 9 accounts for between 4% and 4.5% of our DNA cells. Open Access articles citing this article. The team followed up with a detailed molecular analysis which confirmed that the variant affects the expression of several cytoskeletal proteins and smooth muscle cell function. It is also not too different from chromosome 9 found in baboons and macaques. -, Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Hinrichs AS, Gonzalez JN, et al. Comparatively smaller than Chromosome X, measuring at only 57 megabases in length and containing less than 1.5% of the human genome. A-proteins have hydrophobic amino acid compositions . Bioinformatics in the Era of Post Genomics and Big Data. The position of the longest intron is related to biological functions in some human genes. How has the classification of all protein-coding genes been done? Article Genetic code variants [ edit] Print 2016. 22 June 2021, Receive 51 print issues and online access, Get just this article for as long as you need it, Prices may be subject to local taxes which are calculated during checkout. Bethesda, MD 20894, Web Policies Article Protein-coding genes: 261 to 285 The human genome is massive, and contains over 30,000 protein-coding genes, as well as thousands more pseudogenes and non-coding RNAs. Finally, we confirm that there are no human introns shorter than 30 bp. Proc. Genome Biol. Pseudogenes: 633 to 819. doi: 10.1093/nar/gky1113. About 4000 human protein-coding genes are not mentioned in any scientific publication at all. The new human gene database contains 43,162 genes, of which 21,306 are protein-coding and 21,856 are noncoding, and a total of 323,824 transcripts, for an average of 7.5 transcripts per gene. Ensembl 2019. The https:// ensures that you are connecting to the Open Access "There are 3000 human . doi: 10.1093/database/baw153. Provided by the Springer Nature SharedIt content-sharing initiative. Cunningham F, Achuthan P, Akanni W, Allen J, Amode MR, Armean IM, Bennett R, Bhai J, Billis K, Boddu S, et al. Biol Direct. Contains encoding instructions for Acylamino-acid-releasing enzyme, 5-azacytidine-induced protein 2 and protein C3orf23. Nature 312, 767768 (1984). Join now Sign in Janne Bate's Post Janne Bate Principal Consultant at SRG Search by SRG - the data lead resource solution. Pseudogenes: 1,113 to 1,426. PhyloCSF scores are calculated based on codon substitution frequencies. A total of 155 protein-coding genes mapped to the GO term "regulation of immune system process"; 85 genes from C1, 32 genes from C3 and 38 genes from C5. Pseudogenes: 433 to 594. The description of each field is included in the first row of the spreadsheet table. Article The genes were classified according to specificity into (i) cancer enriched genes with at least four-fold higher expression levels in one cell line cancer type as compared with any other analyzed cell line cancer types; (ii) group enriched genes with enriched expression in a small number of cell line cancer types (2 to 10); and (iii) cancer enhanced genes with only moderately elevated expression. Accounting between 5.5% and 6% of our DNA, chromosome 6 is the site of the Major Histocompatibility Complex, which is the critical for the bodys adaptive immune system. Chromosome 13, with 3% of the bodys mapped human genome, is usually blamed for childhood obesity and delay in speech development. Here they are listed below in order of frequency (1 = most highly researched): TP53 - Encodes the tumour-suppressor protein p53, which is mutated in up to half of all human cancers. Non-coding RNA genes: 450 to 1,598 The site is secure. Cell 42, 93104 (1985). Invest. A tour through the most studied genes in biology reveals some surprises. 2016 Dec 26;2016:baw153. Correspondence to Next the team showed that the same proportion of human protein-coding genes remain a mystery. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. The reasons for the choice of the NCBI Gene database as a reference data source have been previously discussed in detail [6]. Fellowships for FA and MC have been funded by the Fondazione Umano Progresso DIMES N. 3997 24-11-2015, and individual donations acknowledged above. We are grateful to Kirsten Welter for her kind and expert revision of the manuscript. Extensive annotations were added to aid identification of differentially expressed genes, potential gene editing sites, and non-coding gene . How was the similarity of the cell lines to the corresponding TCGA cancer cohorts analysed? Protein-coding genes: 583 to 820 All authors critically discussed the final manuscript. Pseudogenes: 666 to 839. The activity of 43 CytoSig cytokines was inferred based on the gene expression profile of the 1055 cell lines by the package CytoSig (Jiang P et al. Non-coding RNA genes: 355 to 1,207 The assemblage of genes ND5 and ND6 was the worst of all, for which the length was 16% and 27% of the length of the whole gene, respectively. Also, DESeq2 normalized expression values were centered per gene as suggested. Nature 551, 427431 (2017). Human protein-coding genes and gene feature statistics in 2019, https://doi.org/10.1186/s13104-019-4343-8, http://creativecommons.org/licenses/by/4.0/, http://creativecommons.org/publicdomain/zero/1.0/. Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML. Data in the Transcripts.xlsx table include the same first five types of information provided in the Genes.xlsx table, plus RefSeq GenBank accession number for each transcript, length in bp of the whole transcript as well as of its 5 untranslated region UTR, coding sequence (CDS) and 3 UTR, number of exons and coding exons for that transcript, derived from the GeneBaseTranscripts table. The transcriptomics data was then used to. Aim: This study was undertaken with the aim to investigate the association of single nucleotide variants; namely . Advances in the Exon-Intron Database (EID). Gene statistics; Human genes; Protein-coding genes. 2023 Feb;55(2):209-220. doi: 10.1038/s41588-022-01276-9. Jobs People Learning Dismiss Dismiss. Once the taq polymerase starts to replicate DNA, the probe is destroyed and fluorescent material is released . Follow the Python code link for information about updates to the list of genes on these pages. Unauthorized use of these marks is strictly prohibited. Protein coding genes. . In: Abdurakhmonov IY, editor. We set out the expected frequency of ARE-containing genes at 25.55%, considering the ARE database (38) and 19,116 human protein coding genes (39). PhyloCSF is a method that determines the protein-coding potential of individual bases using alignments of the coding regions of multiple organisms representing a range of taxonomic groups. It is one of the only two allosome chromosomes (gender-determining chromosomes) in the human body. Protein class Gene ontology Length & mass Signal peptide (predicted) Transmembrane regions (predicted) MAN1A2-001 ENSP00000348959 ENST00000356554: O60476 [Direct mapping] Mannosyl-oligosaccharide 1,2-alpha-mannosidase IB . Nature 381, 661666 (1996). Piovesan A, Vitale L, Pelleri MC, Strippoli P. Universal tight correlation of codon bias and pool of RNA codons (codonome): the genome is optimized to allow any distribution of gene expression values in the transcriptome from bacteria to humans. Unable to load your collection due to an error, Unable to load your delegates due to an error. Sci. 2003, 460464 (2003). AP and PS designed the study, collected the data and performed the analysis. Non-coding RNA genes: 328 to 992 View/Edit Mouse. 2023 Jan 10;13:1085139. doi: 10.3389/fgene.2022.1085139. ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. 2001;291:130451. The funding sources had no role in the design of this study and collection, analysis, and interpretation of data and in writing the manuscript. They were derived from the GeneBase Genes table, including official Gene Symbol, Chromosome, Gene Type,and gene RefSeq status from the Gene_Summary related table. The second smallest of the lot, the 49 million base pair (1.5%) chromosome 22 has the distinction of being the first even chromosome to be completely sequenced (1999). -, Cunningham F, Achuthan P, Akanni W, Allen J, Amode MR, Armean IM, Bennett R, Bhai J, Billis K, Boddu S, et al. Actually, apart from three introns estimated to be of 13bp long due to NCBI Gene Gene Table artifacts [5], there is one unique intron smaller than 30bp, intron 14 of XBP1 gene, in these data. Measuring 82 megabases, chromosome 13 accounts for up to 3.5% of the human genome. The transcriptomics analysis covers 1055 human cell lines, corresponding to 27 cancer types, one non-cancerous group and one uncategorised group of cellines, and includes classification based on specificity, distribution and expression clusters. Here we provide a tabulated set of data about human nuclear protein-coding genes (genes, transcripts and gene features such as exons, coding portion of the exons and introns) derived from advanced parsing of NCBI Gene web site offered in a standard, ready-to-use spreadsheet format. PubMedGoogle Scholar, Dolgin, E. The most popular genes in the human genome. For this, read counts for HPA and CCLE cell lines quantified by Kallisto were re-analyzed without filtering out the non-protein-coding genes to ensure a broadened coverage of cancer pathway responsive genes. eCollection 2022. PubMed In addition, statistics based on these data and any subset generated from them may be used to tune genomic software requiring parameters about nuclear protein-coding gene, transcript or exon/intron number and length [15, 16]. Protein-coding genes: 45 to 73 Nucleic Acids Res. The UniProtKB/Swiss-Prot Homo sapiens proteome contains one representative . Before Coding Region Position: hg38 chr20:63,488,023-63,497,763 Size: 9,741 Coding . ADS Figure 1: Human species page. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Search human. Nucleic Acids Res. Protein-coding genes: 988 to 1,036 PubMed Central The red circles connected to each tissue name indicates the number of tissue enriched genes associated with that particular tissue. We use cookies to enhance the usability of our website. Then, the average expression per disease was further averaged as the disease baseline expression. London: IntechOpen; 2018. p. 1536. This site needs JavaScript to work properly. On the other hand, a genetic element could be transcribed, and thus identified as a functional gene, only under particular conditions such as a developmental stage, a disease or the exposure to specific stresses or drugs. A description about the classification of genes into the tissue enriched and group enriched categories is found here. Google Scholar. Nat Genet. GENCODE - Human Release 43 Human Release 43 (GRCh38.p13) Statistics of this release More information about this assembly (including patches, scaffolds and haplotypes) Go to GRCh37 version of this release GTF / GFF3 files Fasta files Metadata files NCBI Resource Coordinators. Chromosome 10 Protein-coding genes: 706 to 754 Non-coding RNA genes: 244 to 881 Pseudogenes: 568 to 654 A genomic coordinate list of these protein-coding genes is available as Table S1. An interactive network plot of the numbers of enriched and group enriched genes in all major organs and tissue types in the human body, connected to their respective enriched tissues. This lncRNA sequence is 2,913 nucleotides long and is found in Homo sapiens. Thus, three tables in the open standard format .xlsx (Microsoft, Seattle, WA), Genes.xlsx, Transcripts.xlsx and Gene_Table.xlsx, are provided here. Protein-coding genes: 1,124 to 1,199 Getting a list of protein coding genes in human Getting a list of protein coding genes in human 0 3.3 years ago fi1d18 4.1k Hi I have raw read counts extracted by htseq from STAR alignment I have both data with both Ensembl IDs and gene symbols, but I need only a latest list of protein coding genes in human; I googled but I did not find Comprehensive multi-omic profiling of somatic mutations in malformations of cortical development. Measuring 90 megabases in length, Chromosome 16 has exceptionally high gene density, particularly relating to genetic diseases in humans, which numbers about 150 out of the 90 million nucleotide sequences. Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Hinrichs AS, Gonzalez JN, et al. The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Bookshelf Nature 312, 763767 (1984). Gene expression data were processed in the same way as for PROGENy analysis. Pseudogenes: 288 to 379. Gene disorders here are linked to diseases such as autism, EhlersDanlos syndrome and variants of dementia. Considering only upregulated DEGs or. official website and that any information you provide is encrypted The three data tables Genes.xlsx, Transcripts.xlsx and Gene_Table.xlsx have been released in the public repository Open Science Framework and they can be freely downloaded at the address: https://osf.io/mhda7/. Further analysis of transcriptome data and clinical data from cancer patients showed that recurrently p53-regulated lncRNAs are associated with patient survival. if a gene is enriched in cellines from a particular cancer type (specificity), which genes have a similar expression profile across the cell lines (expression cluster), the catalogue of genes elevated in each of the cell lines, which cell line has the most consistent expression profile to its corresponding TCGA disease cohort (i.e., the best cell lines for cancer study), cancer-related pathway and cytokine activity of each cell line, (i) classify the gene expression specificity in different cancer types and the distribution across all cell lines, (ii) evaluate the consistency between the cell lines and the corresponding TCGA disease cohort, (iii) estimate the cancer-related pathway (PROGENy) and cytokine (CytoSig) activity (with non-protein-coding genes included for calculation), (iv) find the highest correlating genes and further to classify all genes according to their cell line-specific expression. When the first draft of the human genome sequence published in 2001, there were approximately 30,000-40,000 protein-coding sequences. Non-coding RNA genes: 148 to 515 KJ901729 - Synthetic construct Homo sapiens clone ccsbBroadEn_11123 CCL25 gene, encodes complete protein. (ii) The enrichment of the TCGA cohort elevated genes (i.e., the union of enriched, group enriched, and enhanced genes in the TCGA cohort) in cell lines was evaluated by gene set enrichment analysis (GSEA). Using GeneBase, a software with a graphical interface able to import and elaborate National Center for Biotechnology Information (NCBI) Gene database entries, we provide tabulated spreadsheets updated to 2019 about human nuclear protein-coding gene data set ready to be used for any type of analysis about genes, transcripts and gene organization. Initial sequencing and analysis of the human genome. We provide here a tabulated set of data about human nuclear protein-coding genes that may be useful for human genome studies and analysis. Careers. Non-coding RNA genes: 245 to 973 Using the spreadsheet filtering and summarization functions (Excel for Mac 2011, Microsoft) or exploiting the search and calculation functions in GeneBase (FileMaker Pro) provided identical results in all cases. While the basic approach to obtain the data we present here is similar to the one followed in our previous study about the subject [6], there are two main differences. Although more than 90% of protein-coding genes in mouse have a 1:1 orthology relationship with a gene in human or rat, we also represent many-to-many 'orthology' relationships. Despite containing only up to 5.0% of the bodys DNA, chromosome 8 is quite important as over 8% of its genes are specialists in brain development. ISTOCK, BLACKJACK3D T he human genome may contain more protein-coding genes than prior analyses suggested. Chromosome 9 accounts for between 4% and 4.5% of our DNA cells. The concept is that genes that have an elevated expression in a TCGA cohort can be considered as the cohort signature, and their high expression should be reflected by cell line models. J. Clin. Protein-coding genes: 417 to 496 The best assembled were COX1, COX3, and ND4L, as they have collected more than 90% of the protein-coding-gene length. Genes contain nucleotides strands containing instructions on how to generate protein or RNA molecules. All these kinds of analyses depend on the chosen gene entry subset, the RefSeq classification system and are subject to the accuracy of the input dataset. Higher-order chromatin conformation forms a scaffold upon which epigenetic mechanisms converge to regulate gene expression [1, 2].Many genes are expressed in an allele-specific manner in the human genome, and this phenomenon is an important contributor to heritable differences in phenotypic traits and can be cause of congenital and acquired diseases including cancer [3, 4]. Non-coding RNA genes: 483 to 1,158 Depending on the genome-sequencing center, OLNs are only attributed to protein-coding genes, or also to pseudogenes, and also to tRNA-coding genes and others. Nucleic Acids Res. Privacy Chromosome 1 (human) Chromosome 2 (human) Chromosome 3 (human) Chromosome 4 (human) Chromosome 5 (human) Chromosome 6 (human) Chromosome 7 (human) Chromosome 8 (human) Chromosome 9 (human) Chromosome 10 (human) 2015;22:495503. 2013;101:2829. The team was left with 21,306 protein-coding genes and 21,856 non-coding genes many more than are included in the two most widely used human-gene databases. Abstract. Sci Rep. 2018;8:2977. The similarity between cell lines and the corresponding TCGA cohort was estimated by two different approaches: For all 1055 analyzed cell lines, the activity of a total of 14 cancer-related pathways were inferred using the PROGENy, a package that relies on biological data mining of publicly available data to obtain cancer-related pathway responsive genes for human and mouse (Schubert M et al. The largest of its kind, the Human Reference Interactome (HuRI) map charts 52,569 interactions between 8,275 human proteins, as described in a study published in Nature. Tu Q, Cameron RA, Worley KC, Gibbs RA, Davidson EH. Non-coding RNA genes: 707 to 1,924 This is the list of human protein-coding genes linked to SARS-CoV-2 infection and / or COVID-19 disease currently being targeted for re-annotation by GENCODE. Nucleic Acids Res. The orange circles indicate the number of genes with enriched expression in a group of tissues, connected by lines. Human Gene EEF1A2 (ENST00000706949.1) from GENCODE V43 . 2014;23:586678. Produces many zinc based proteins, such as ZBTB43 and ZNF79. Examples: HI0934, Rv3245c, ECs2657/ECs2658 GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics. Accounting for just one and a half percent of the human genome, chromosome 21 is infamous for its role in Down syndrome. 2019;47:D8538. Article Follow . Protein-coding genes: 804 to 874 Non-coding RNA genes: 271 to 1,060 The three main human databases (GENCODE/Ensembl, RefSeq, UniProtKB) contain a total of 22,210 protein-coding genes but only 19,446 of these genes are found in all three databases. In 2008, a draft of the complete human proteome was released from UniProtKB/Swiss-Prot: the approximately 20,000 putative human protein-coding genes were represented by one UniProtKB/Swiss-Prot entry each, tagged with the keyword 'Complete proteome' (now obsolete) and later linked to proteome identifier UP000005640.. Caracausi M, Ghini V, Locatelli C, Mericio M, Piovesan A, Antonaros F, Pelleri MC, Vitale L, Vacca RA, Bedetti F, et al. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The 985 cancer cell lines were analyzed for their representability of the corresponding TCGA disease cohorts. 28S ribosomal protein L42, mitochondrial is a protein that in humans is encoded by the MRPL42 gene. Protein-coding genes: 739 to 822 Despite its massive size of 155 megabases, chromosome X only accounts for 5% of the human genome. Epub 2023 Jan 12. A study published last month (May 29) on BioRxiv provides an expanded database of approximately 5,000 novel genesof those, around 1,000 code for proteins, expanding the estimated number of protein-coding genes from around 20,000 to 21,000. In fact, scientists have estimated that there may be as many as 500,000 or more different human proteins, all coded by a mere 20,000 protein-coding genes. A genome-wide expression analysis of 1055 human cell lines, including 985 cancer cell lines, was performed using RNA-seq with early-split samples as duplicates. Among more than 60 different . You can filter the table results by gene type to show only protein-coding or non-coding genes, or search within the list of human genes by gene name or protein name. Dismiss. Tissues and organs are divided into groups according to functional features they have in common. 2016;25:252538. Integr Org Biol. of the ORF-K1 gene encoding a highly variable glycoprotein related to the immunoglobulin receptor family that maps at the extreme left-hand end of the HHV-8 genome. Use of a fluorescent probe which will bind to the target DNA if present (e. a specific gene's reverse transcribed mRNA). Human, non-human primates, domestic species and default for everything that is not a mouse, rat, fish, worm, or fly Full gene names are not italicized and Greek symbols are not used eg: insulin-like growth factor 1 Gene symbols Greek symbols are never used (e.g., TNFA, not TNF; PPARG, not PPAR ;) hyphens are almost never used Open Access For the remaining protein-coding genes, 39 to 86% of the length was assembled. If two predicted genes have been merged to form a new gene, both OLNs are indicated, separated by a slash. Filtering by the Yes annotation allows the retrieval of a non-redundant set of exons, coding exons and introns, respectively. This selection retrieved 19,116 genes, 46,932 transcripts and 562,164 exons. Internet Explorer). Search model organisms. 17 January 2023, Mammalian Genome Symp. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. The availability of the data sets presented here allows a ready update of main parameters about human genome, often cited in textbooks or reports without a source accounting for a rigorous method for extracting this information. The Human Protein Atlas project is funded. Correlation tests were used to identify relationships between gene length and other gene and protein characteristics.