ALFRED: allele frequency database
      The ALlele FREquency Database   
ALFRED is a resource of gene frequency data on human populations
supported by the Yale Center for Medical Informatics.

Overview
Criteria for Data Entry into ALFRED
Data submission to ALFRED - guidelines
Use of ALFRED
How to create URL to ALFRED
Technical Description
ALFRED Publications
ALFRED Presentations

Overview

ALFRED has been designed to make allele frequency data on anthropologically defined human population samples readily available to the scientific community and to link these polymorphism data to the molecular genetics-human genome databases. Initially, ALFRED contained primarily data generated in the laboratories of K.K. and J.R. Kidd in the Department of Genetics at Yale, including extensive unpublished data. Data from the published literature are being entered into ALFRED in a systematic way, with a focus on polymorphisms studied in many different populations. (Researchers wishing to have their data entered into ALFRED should contact us. If suitable data can be sent in appropriate electronic format, it will be much easier to include them in ALFRED, see "Criteria" below.) ALFRED is distinct from such databases as dbSNP, which catalogs sequence variation. ALFRED's focus is on allele frequencies in diverse anthropologically defined populations. It is not a compendium of human DNA polymorphisms but of frequencies of selected polymorphisms with an emphasis on those that have been studied in multiple populations. All of the data in ALFRED are considered to be in the public domain and available for use in research and teaching.

ALFRED is a work in progress. The structure and functionality of ALFRED are being revised in an ongoing process as time allows improvements to be implemented. We are also routinely adding new data and links to other databases. Those of us on the ALFRED staff hope these data will be useful to others. We welcome comments on content, structure, and the interfaces available.

ALFRED is maintained by the ALFRED Staff.

Top of page


Criteria for Data Entry into ALFRED

We feel that gene frequency data are only meaningful if a population sample is reasonably well defined and large enough for a reasonably accurate frequency estimate and if the polymorphism is sufficiently defined to be replicable. Therefore, not all published gene frequency data will be included in ALFRED. Currently, there is no absolute minimum sample size since samples that are small but come from sparsely represented areas and/or have data on multiple polymorphisms can be very useful. (The Nasioi sample is a specific example of such small sample (22 individuals) studied for multiple markers.) We are only including samples that can be reasonably specified as to ethnicity though this includes some highly heterogeneous groups such as "United States whites", which is the equivalent of "mixed European".

Because ALFRED focuses on gene frequency variation among populations, we are generally including frequency estimates only for polymorphisms that have been studied in at least six distinct population samples. Of course, there are exceptions, especially if a sample generally studied for multiple polymorphisms that meet the ">6 populations" criterion also has data on some unique polymorphisms. Similarly, various small sets of data are being entered when we know or expect there are additional data becoming or already available for eventual entry into ALFRED. Our available resources to enter data from the literature are quite limited but if researchers can provide suitable electronic versions of their appropriate data we will be happy to add them to ALFRED.

The above criteria for inclusion in ALFRED are subject to change. We welcome comments from the scientific community interested in these data and will attempt to follow any consensus that emerges.

Top of page


Data submission to ALFRED - guidelines

Only data on well defined population samples that are large enough to yield reasonably accurate frequencies and for polymorphisms sufficiently defined to be replicable can be included in ALFRED. If the submitted data lack any of this information, the data lose much of their value. Therefore, we are looking for the most complete set of information for each population sample. As an aid in your ALFRED submission we have listed below guidelines that you should follow in order to provide the necessary data. In addition to the actual frequencies, we will need further information regarding your publications, samples, loci and sites. Below you will find what type other information we will need to input your data into ALFRED.

Publications
We need any citations for publications that contain the allele frequencies that you are submitting.Your data will be linked to these publications. If your data are unpublished the data will be referred to the researcher’s name saying they are unpublished. If they are published in the future, you will need to send us the new citation in order for the data to be linked to the appropriate publication.

Sample

ALFRED distinguishes samples of a population (a particular set of individuals) from the population. This allows future researchers to consider whether differences of whatever sort, shown by different loci, for example, can be attributed to the loci because the same set of individuals was tested for both loci or might be attributed to sampling differences because different sets of individuals were tested.

To describe the sample sufficiently, we need to know about certain elements concerning your specific sample. The information you provide will also be used to aid in a complete description of the population from which the sample was collected. The following information is most important:

  • specific sample information when available (related/unrelated, age, men/women, etc.);

  • where the sample was collected;

  • geographic location of sample (city, village, etc.);

  • precisely who collected the sample (source);

  • the ethnic make-up of the sample;

  • the sample size (number of individuals);

  • any references to articles with published information concerning your particular sample;

  • language spoken;

    If the population is not yet described in ALFRED additional information, if available, can also be useful in describing the population. Such information would include:

  • population name;

  • subsistence practices;

  • historical information ;

  • religion;

  • any other cultural traditions;

  • any ethnographic citations.

If all of this information is not available, please give the most complete information as possible.

Locus:

Please provide the official name and symbol for the gene in which the site is located. For the official gene symbol, go to the HGNC website http://www.genenames.org/cgi-bin/hgnc_search.pl.

If you cannot find an official gene symbol for the site, please give us as much information as possible. For intergenic sites it is helpful to have the name of the closest gene and an indication of the distance from the gene 5' or 3'

Site (polymorphism):
  • Specify polymorphism location within the gene if available (e.g. "intron 3");

  • Specify nucleotide/amino acid variations if available;
  • Specify restriction enzyme used to detect this polymorphism if it’s a RFLP;
  • Give GenBank accession number and/or dbSNP ss# (rs#) if available;
  • Other additional information used to best describe this site.
  • Give 10-15 bp flanking sequence on both sides of variant if available.
Allele:

  • For RFLP, use site absent and site present for the two allele names;
  • For STRP and VNTR, use number of repeats as allele names. If you use fragment size as allele names, please provide typing protocol and specify primer sequences;
  • For Ins/Del, use insertion and deletion as allele names;
  • For SNP, use nucleotide variants as allele names. For example, A/G SNP has A and G for the two alleles.
Excel Spreadsheet:

The excel spreadsheet provided, can be used for submitting data to the ALFRED Staff at alfred@yale.edu . The information that has be provided in the spreadsheet are as follows:

  • Sample name: name of sample you used in publication;
  • Number of chromosomes: number of chromosomes in the population sample;
  • Locus symbol: official gene symbol;
  • Site name: site name used in publication;
  • Allele name: see above;
  • Frequency: frequency value;
  • Typed number of chromosomes: number of chromosomes actually typed for that marker in that population sample (may be less than the sample size because of missing data);
  • Typing method: method used for typing each polymorphism.

Sample, locus, and site information can be submitted via a Word file or other comparable word processing file. If you have problems using the excel spreadsheet provided please let us know and we will work with you to come up with a more useful method of submitting your data. If you have any questions, comments or suggestions about any aspect of ALFRED, please feel free to contact us.


Use of ALFRED

Help is available within ALFRED and we hope the searching procedures are fairly intuitive; we welcome comments if you find some aspect is not obvious or does not work as expected. The following are some hints about ways we think you can get a quick overview of what is in ALFRED.

Search
There are several great ways of searching ALFRED. The following search options are available from the tabbed menu provided for easy browsing.
Basic Search
Loci
Population
Map Interface
Keyword Search

  • Basic Search
    • Searches for detailed information using UID


    • Unique IDentifiers (UIDs) are used in ALFRED to access specific records from various tables. The UIDs are a text string consisting of three parts: a table identifier, a record number, and a check character. The Table Identifier is a two character symbol representing the table the record belongs to, such as PO for POpulation, SI for SIte, and so on. The Record Number is an internal identifier for the specific record. The Check Character is a simple checksum for the digits in the Record Number. The Check Character is determined by summing the digits of the Record Number, taking the modulo 26 of that number, and representing the resulting number as an upper case ASCII character (A-Z).

      UIDs are stable for a particular record, providing a dependable means for users to access data. Submitters to ALFRED are encouraged to publish the appropriate UIDs with their data so that readers can reliably access the relevant data.

    • Search for publication using author's last name


    • Users can search ALFRED by a publication's author's last name. The search results display a list of citations by the author with a link to the appropriate frequency tables(s). By following the links, the user can get the allele frequency tables linked to the particular publication. The results for the author last name 'Osier' would be this.

    • Search for Frequencies


    • Searches can be performed using a combination of fields including gene symbol, polymorphism name, typing method, population name and entry date to retrieve frequencies.
  • Loci
  • Population
    • Follow the links Geographic Region - Population
    • Selecting a population would bring you to Population Information page. Information regarding population samples are also provided in this page.
  • Map Interface
  • The GIS Map Interface is the newest function in ALFRED. There are multiple ways of searching the Map Interface:
    • Search by browsing population names.
    • Search for loci and sites by chromosome number.
    • Keyword search using an Official Gene Symbol, ALFRED locus name, ALFRED polymorphism name, dbSNP rs#, gdb ID, gepgraphic region or population name.
    • Search using an ALFRED UID.
    There are multiple ways of viewing ALFRED data on the Map Interface.
    • View all the ALFRED populations.
    • View a selected population. For example Biaka
    • View allele frequency pie charts for a selected site. For example ADH1A, intron 8 C/T (BccI).

  • Keyword Search
  • The Keyword Search function helps a user to query the database using a list of keywords separated by semi-colons. There are 2 types of searches one can perform.
    • Search for entries in ALFRED.

    • Fields available for keyword search are official gene symbols, ALFRED loci names, dbSNP rs#s, GDB IDs for loci, population names, ALFRED sample names, ALFRED polymorphism names and ALFRED geographic names.
    • Search for frequency tables using gene and population names.

    • The search for Frequency Tables function helps a user to query the database using a list of gene names or gene symbols and population names separated by semi-colons. The search result is a matrix of genes X populations indicating the presence or absenece of frequency tables for each combination. An example list of entries for a frequency search would be Biaka ; Danes ; Druze ;Yakut; drd3 ; ami ; dm1


    Data display and accessibility

    There are several ways allele frequency data can be viewed and downloaded from ALFRED. Allele frequency data for individual polymorphisms can be displayed in
    Data in ALFRED are available for download in different user-friendly formats.
    • Allele frequency for individual polymorphisms can be downloaded in semi-colon delimited format.
    • The entire database can be downloaded in XML format from here.

    • Depending on the user's requirements, a researcher can download the entire database with or without descriptions or have the tables separately downloaded in XML format.
    • Frequency, population and polymorphism information in downloadable text files.

    • Data in these files can be seamlessly imported into Excel spreadsheets for further analysis.


    Summaries

    ALFRED offers multiple summary tables which includes


    Documentation

    The ALFRED team puts together a newsletter which is emailed to all the registered users on a regular basis. The newsletter includes several types of information which keeps the user up-to-date with any new functions, recent data uploads, and friendly tips on using ALFRED. Other documentation files available from ALFRED's 'Documentation' tab are

    Top of page

    How to create URLs to ALFRED description pages

    Creating URL links to access description pages in ALFRED is very straight forward.

    URL to locus description page
    • Download ALFREDGeneInfo.csv from 'Summaries' -> 'Downloads' menu tab
    • The file ALFREDGeneInfo.csv lists alfred_uid with corresponding entrez_gene_id and gene_symbol. The file is comma-delimited.
    • Every record in ALFRED has a unique identifier (UID). The UIDs are a text string consisting of three parts: a Table Identifier, a Record Number, and a Check Character. For example, LO000423J is the UID for the locus ADH4. Based on these UIDs, you can create URLs in the following format by appending the UID to the end of the base URL http://alfred.med.yale.edu/alfred/recordinfo.asp?UNID=UID (where UID will be replaced by the actual UID value). Thus the complete URL to access the page to ADH4 would be http://alfred.med.yale.edu/alfred/recordinfo.asp?UNID=LO000423J

    URL to site description page
    • Download ALFREDVariantInfo.csv from 'Summaries' -> 'Downloads' menu tab
    • The file ALFREDVariantInfo.csv lists alfred_uid with corresponding dbSNP rsnumber. The file is comma-delimited.
    • Every record in ALFRED has a unique identifier (UID). The UIDs are a text string consisting of three parts: a Table Identifier, a Record Number, and a Check Character. For example, SI014100G is the UID for the rsnumber rs1126670. Based on these UIDs, you can create URLs in the following format by appending the UID to the end of the base URL. http://alfred.med.yale.edu/alfred/recordinfo.asp?UNID=UID (where UID will be replaced by the actual UID value). Thus the complete URL to access the page to rs1126670 would be http://alfred.med.yale.edu/alfred/recordinfo.asp?UNID=SI014100G
    Top of page

    Technical Description

    Intially, to achieve rapid prototyping to test new structures and functionality, ALFRED was implemented using Microsofts Access, an SQL- compliant microcomputer-based relational database package. The current version of ALFRED is implemented using Oracle version 8.1.7.4, on one of Yale’s institutional database servers where it will be maintained and backed up on a regular basis. ALFRED is implemented as a
    relational database. The Web front end is built using Active Server Pages (ASP). Most of the user interface code is written in Visual Basic scripts (VBscripts) and database access is implemented using Open Database Connectivity (ODBC). One advantage of using ASP is the ease with which data from databases may be accessed and published to the Web through the use of ActiveX object components (e.g., ActiveX Data Object or ADO). In addition, Visual Basic scripting is easy to learn. While a very small amount of client-side code (using JavaScript) is used, most of our code is run on the server side with ASP. We have minimized client-side coding to avoid the problem of incompatibilities among different types and versions of Web browsers. We are using Internet Information Server (IIS) as our Web server (ASP is a part of IIS), which runs on Windows 2000.

    Top of page


    ALFRED Publications

    1. Cheung KH, Miller PL, Kidd JR, Kidd KK, Osier MV, Pakstis AJ. "ALFRED: a Web-accessible allele frequency database".Pac Symp Biocomput 2000.:639-50. pdf file of article

    2. Cheung KH, Osier MV, Kidd JR, Pakstis AJ, Miller PL, Kidd KK. "ALFRED: an allele frequency database for diverse populations and DNA polymorphisms.".Nucleic Acids Res. 28(1):361-3. (2000) pdf file of article

    3. Osier MV, Cheung KH, Kidd JR, Pakstis AJ, Miller PL, Kidd KK. "ALFRED: an allele frequency database for diverse populations and DNA polymorphisms--an update." Nucleic Acids Res. 29(1):317-9. (2001) pdf file of article

    4. Osier MV, Cheung KH, Kidd JR, Pakstis AJ, Miller PL, Kidd KK. "Expansion of ALFRED, the ALlele FREquency Database." Am J Phys Anthropol. Annual Meeting Issue: Supplement 34:94. (2002)

    5. Osier MV, Cheung KH, Kidd JR, Pakstis AJ, Miller PL, Kidd KK. "ALFRED: an allele frequency database for Anthropology." Am J Phys Anthropol. 119:77-83. (2002) pdf file of article

    6. Osier MV, Cheung KH, Rajeevan H, Pakstis AJ, Kidd JR, Miller PL, Kidd KK. "ALFRED(ALlele FREquency Database): A Resource for genetic anthropology and human population genetics." Human Origins & Disease . CSHL Meeting Abstracts of Papers. 43. (2002)

    7. Rajeevan H, Osier MV, Cheung KH, Deng H, Druskin L, Heinzen R, Kidd JR, Stein S, Pakstis AJ, Tosches NP, Yeh CC, Miller PL, Kidd KK. "ALFRED – the ALlele FREquency Database – update." Nucleic Acids Research..31(1):270-271.(2003) pdf file of article

    8. Kidd KK, Rajeevan H, Osier MV, Cheung KH, Deng H, Druskin L, Heinzen R, Kidd JR, Stein S, Pakstis AJ, Tosches NP, Yeh CC, Miller PL. "ALFRED – the ALlele FREquency Database – update." Am J Phys Anthropol. Annual Meeting Issue: Supplement S36:128. (2003)

    9. Rajeevan H, Cheung KH, Gadagkar R, Stein S, Soundararajan U, Kidd JR, Pakstis AJ, Miller P, Kidd KK. "ALFRED: An allele frequency database for Microevolutionary Studies." Evolutionary Bioinformatics Online.2005:1 (2005) pdf file of article

    10.Rajeevan H, Soundararajan U, Kidd JR, Pakstis AJ, Kidd KK. "ALFRED: an allele frequency resource for research and teaching." Nucleic Acids Research.40(D1): D1010-D1015.(2012) pdf file of article

    Top of page

    ALFRED Presentations (PowerPoint)

    1. The American Association of Physical Anthropologists (AAPA) 2003 .
    2. The American Society of Human Genetics (ASHG) 2003 .
    3. The Pacific Symposium on Biocomputing (PSB) 2004 .
    4. The American Society of Human Genetics (ASHG) 2004 .
    5. The American Society of Human Genetics (ASHG) 2005 .
    6. The Pacific Symposium on Biocomputing (PSB) 2006 .
    7. The Pacific Symposium on Biocomputing (PSB) 2007 .
    8. The American Society of Human Genetics (ASHG) 2008 .
    9. The American Society of Human Genetics (ASHG) 2009 .

    Top of page

    © 2019 Kenneth K Kidd, Yale University. All rights reserved. The full Copyright Notification is also available.
    Originally prototyped by Michael Osier with the aid of Kei Cheung
    Upgrades and maintenance since 2002 by Haseena Rajeevan