Talk:Protein Data Bank

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Computational Biology (Rated C-class, High-importance)
WikiProject iconThis article is within the scope of WikiProject Computational Biology, a collaborative effort to improve the coverage of Computational Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
 
WikiProject Molecular and Cell Biology (Rated C-class, Mid-importance)
WikiProject iconThis article is within the scope of the WikiProject Molecular and Cell Biology. To participate, visit the WikiProject for more information.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 

Untitled[edit]

This is more as a word of warning: Viewing structures via the PDB seems to work best with Netscape 4.7x. I have no luck with IE and some of the Chime-dependent display programs warn you that Netscape 6 won't work either. - David M

More detail and references needed in history section[edit]

From the article: "The PDB is a key resource in structural biology and is critical to more recent work in structural genomics." Some references here for work that uses the PDB in an interesting/important way would be good.

From the article: "Countless derived databases and projects have been developed to integrate and classify the PDB in terms of protein structure, protein function and protein evolution." Such as? Give some examples for the derived databases.

From the growth section: "The growth rate of the PDB has been the subject of fairly extensive analysis." .. such as? this needs referencing, why was it subject to extensive analysis? where are these?

I agree that these statements should be cited, I don't agree that a lack of citation is a reason to 'dumb down' the article. I don't think any structural biologists or crystallographers would have too much trouble accepting the above facts. Nevertheless, it seems Wikipedians are always willing to take it upon themselves to dumb down an article, rather than to educate themselves! I'll try to find some references for these facts (or at least some citeable examples) and re-instate them in the article. --Dan|(talk) 15:59, 23 May 2009 (UTC)

Raw data?[edit]

Knowing the amount of modeling which goes to a structure deposited to the PDB, I would hardly call the coordinate files in PDB as 'raw data'.

Protein Data Bank (file format) needed[edit]

A new page Protein Data Bank (file format) is needed, which should cross link to Chemical file format and use also the proper [[Category:Chemical file format]] category. JKW 15:58, 8 April 2006 (UTC)

Initial Protein Data Bank (file format) created and anything related to format discussions on the Protein Data Bank should be moved to this page. JKW 11:20, 22 April 2006 (UTC)
I agree we should move junk from the file format section on this article to the file format article. --Dan|(talk) 13:41, 7 March 2008 (UTC)

Rewrite[edit]

I believe all the concerns above have been considered in the revision of the article today. I think every thing is referenced, though sometimes one reference covers an entire paragraph.--Christopher King (talk) 03:59, 5 January 2009 (UTC)

See my comment above in the section #More detail and references needed in history section. --Dan|(talk) 16:00, 23 May 2009 (UTC)

Not public domain[edit]

I removed from the lead paragraph the claim that all information from PDB.comPDB.org is in the public domain. This claim is simply false. RCSB is partially responsible for this confusion, but it's important to note that nowhere do they indicate that the material is in the "public domain". Much of their material comes from a large variety of sources and there's no evidence that they even have the authority (much less the resources) to place it all into the public domain.

In particular, note these restrictions detailed at "Advisory for the Use of the PDB Archive" that make the content unacceptable for Wikipedia (and Commons):

  • "Redistribution of modified data files using the same file name as is on the FTP server is prohibited."
  • "The user assumes all responsibility for insuring that intellectual property claims associated with any data set deposited in the PDB archive are honored."

And note these restrictions at "Policies & References"

  • "By using the materials available in the PDB archive, the user agrees to abide by the conditions described in the PDB Advisory Notice."
  • "Molecule of the Month illustrations are copyrighted. They are available for educational purposes, provided attribution is given to David S. Goodsell and the RCSB PDB. Molecule of the Month articles are copyrighted by the RCSB PDB and the authors of the article. Text can only be reprinted with permission, with attribution, and without the right to manipulate or change its content."

Danorton (talk) 05:07, 15 April 2009 (UTC)

PDB.com has nothing to do with pdb.org or RCSB (I'm sure that was just a typo). However, I think I might challenge pdb.org about those claims of copyright (not on the "Molecule of the Month" images; the actual PDB files). The have no bases for copyright over PDB files. --Thorwald (talk) 08:03, 15 April 2009 (UTC)

Need expansion/section for PDB Identifier[edit]

There needs to be more information about PDB Identifiers. I am a hobbyist editor and not a biochemist so I do not know the origins of the identifier or who started this naming format. Who came up with the identifier design and method? Who gets to decide what proteins get added to the series?

Are there "reserved regions" of the identifier for certain content, or are new sequences just added serially as it arrives from researchers?

As far as I can determine, it is a base-36 naming system using numbers 0-9 and letters A-Z. If it is limited to just four powers, as currently stated in the text of this article, that is only 1,679,616 possible PDB entries.

Does anyone seriously believe that there will never be more than 1.7 million protein structures found and mapped, across the entire history of life on the planet?

It would make more sense if additional powers/digits can be added as needed. A fifth digit will allow 60,466,176 total patterns, and sixth digit alows 2,176,782,336 patterns, etc.

DMahalko (talk) 23:00, 20 May 2009 (UTC)

Hey DMahalko, your understanding of the code is correct. I don't know how it was decided on, but that is how it is. In the past authors picked their own codes. These days they are automatically assigned by the submission software. The limited number of codes has been discussed on and off over the years on the PDB-L mailing list https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l (where I'm sure someone would answer any questions you have on the above. You may also like to search (or even improve) the unofficial PDB FAQ here http://pdbwiki.org/index.php/PDB_FAQ HTH --Dan|(talk) 16:06, 23 May 2009 (UTC)
Actually PDBWiki has an article all about the "PDB code" http://pdbwiki.org/index.php/PDB_code --Dan|(talk) 16:08, 23 May 2009 (UTC)

Physical Location[edit]

The physical location deserves to be mentioned or listed as a coordinates. I may be mistaken, but I think the sole location for the repository is at Rutgers University (http://www.biomaps.rutgers.edu/index.php?option=com_content&task=section&id=1&Itemid=2 third paragraph). I've seen a wing of a building at Rutgers marked Protein Data Bank: (40.524497°, -74.461634°). Pulu (talk) 18:37, 15 July 2009 (UTC)

Growth trend[edit]

The growth trend has a link towards the official PDB site.

However had, this starts in 1976; the plot however starts in 1972. From 1972 to 1976, it would thus have 0 entries. This does not seem to make a lot of sense to me?

If SEARCH was the original database, I have another source that states that PDB started with 7 entries. Could perhaps additional verification be provided to explain why it starts at 1976, and which entries were the first one? 80.110.81.222 (talk) 21:06, 30 May 2015 (UTC)

Assessment comment[edit]

The comment(s) below were originally left at Talk:Protein Data Bank/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

Although I feel that this article is more than a stub, I don't think it matches the criteria for start, being a general introduction to an incomplete range of topics (i.e. no "subheading that fully treats an element of the topic" nor "multiple subheadings that indicate material that could be added to complete the article"). Secondly, although this article is Top importance as far as I am concerned, it is Mid or Low on your scale. --Dan

Last edited at 13:40, 7 March 2008 (UTC). Substituted at 03:28, 30 April 2016 (UTC)

A lot, then less, then the same again?[edit]

"100 registered structures milestone in 1982, the 1,000 in 1993, the 10,000 in 1999, and the 100,000 in 2014" What? 100, 1, 10 and then 100 again? Is this correct? Or is it somebody who do not know the ISO-Standard of thousand separators who has been writing this? — Preceding unsigned comment added by 78.67.250.137 (talk) 19:34, 21 June 2017 (UTC)