You are here: Foswiki>MITOWIKI Web>MartyBrandonNotebook>ArchivedProjectDataCuration (12 Feb 2016, MartyBrandon)

MITOMAP Data Curation

This project has been archived. Data curation is still an important concern, but this was an attempt to create an Open Office client tool for the MITOMAP curator. Because our informatics efforts have become so web-based, a curation tool implemented using AJAX seems a better choice. The work of this project is now consolidated back into the MITOMAP project.

%TOGGLE{ target="#contents" text="Contents" effect="toggle" }%

Description

Improve the harvesting of mtDNA data for MITOMAP.

Notes

To Do

mitocurator - OO used to open the extracted variation list and a BASIC macro makes the appropriate spreadsheet entries
mitoextractor - Perl program using wxCocoaDialog for data extraction (can this use FileIO.pm?, RefSeq.pm? . . ., other Mitomaster integration)
crowd sourcing - Not sure exactly. Some ideas:
- expand the submission of data
- allow comments on data
create a trigger in the mito database to generate edit dates and remove these tables from Mitomap.xls

Curation Tools

~~Get OO setup on Marie's computer w/ database connection~~
~~Test the use of Strawberry perl with wxCocoaDialog on my computer~~
~~Get Strawberry Perl setup on Marie's computer~~
mitoextractor
- ~~create prototype file selection executable~~
- add file validation code (rework FileIO.pm)
- add code to parse HTML
- add code to extract variants from phylotree file
- package with pp - http://search.cpan.org/~smueller/PAR-Packer-0.982/lib/pp.pm
mitocurator (mitomap_prototype) - focus on regular polymorphism curation, the references, then clinical variants
- ~~create a prototype GUI that is passing values~~
- ~~create OO BASIC data structure for the rCRS and use to automatically give the ref allele to the curator~~
- ~~add some dummy data~~
- ~~create a trigger function that will populate the Variant ID field for variants that have been previously seen~~
  - ~~must search not only the polymorphism sheet, but also the clinical spreadsheets~~
  - ~~give a descriptor value that indicates how the previously reported variant was classified~~
- ~~create function that will populate the 'Previously Reported' field (11/19)~~
  - ~~get all references from the appropriate linker table~~
  - ~~use a format that identifies the type of citation and the authors~~
- implement the predicted 'Coding Change' (11/28)
  - create OO BASIC data structure for coding changes
    - implement this in RefSeq.pm and then generate the declarations in OO BASIC, alternatively implement the declarations in psql and begin to make greater use of database function calls
  - also make use of this to generate the complete set of mtDNA alleles (complete set of allele topic pages)
  - create a trigger function that uses the data structure to give feedback to the curator
- implement the actual spreadsheet entries on submitting a variant (12/5)
  - use the max ID value of the spredsheet
  - add automatic updating of the edit date values
- implement managing the 'Note' field (12/8)
- implement data syncing - mitosync (12/12)
  - adjust mito database to authenticate a mitoadmin connection
  - test DML in OO
  - sync macro based on max. value of primary key
- rewrite rebuild script for OO format (12/23)
  - how best to parse OO Calc format? perl modules??
  - refactor FileIO.pm again!!
- add support for clinical field entries (12/29)
- auto-increment reference ID values when Marie pastes a new reference from biglist (1/5)
  - can this be done using a cell definition?
  - trigger on the spreadsheet?

AJAX Curation Tool

The mitocurator interface might be just as easily implemented (maybe more) using AJAX instead of OpenOffice. Doesn't look too hard:

Create a webpage that makes the XMLRequest using one of the Javascript frameworks
Create a perl script on the server which receives the request, queries the database, and returns the data to JSON
These could probably be facilitated with one of the Perl frameworks (Catalyst, Gantry, Mojolicious, or Titanium). Probably better to use a light-weight framework with Foswiki.
Though PostgreSQL can be induced to return XML, there seems to be a reasonable argument for using JSON, which has a very simple structure.
There are also numerous Perl modules for working with JSON.
AJAX Perl article - good simple example

Crowdsourcing

How can to utilize community contribution to increase the quantity and accuracy of our data?

Variant Pages - Create a wiki page for each mtDNA variant. Link to MITOMAP information, output from MITOMASTER, and include a section curated by the community.

Log

9/14/2009
Did some work getting AJAX functionality working with Foswiki's JQuery plugin. Very nice! This certainly seems like the way to go for a curation tool. No need to learn OOBasic, just do all the updates on a "hot" database within the system. Curator has nothing to install, maybe this could lead to crowd-sourcing. Also easier to implement. Plan: implement a prototype of what I've done in OpenOffice on a page accessible only to Marie. Get some feedback to make a final decision.

7/4/2009
Killed Open Office framework project and incorporated that work into this project. Seems that OpenOffice serves a good niche for data management, but I'll stick to web-based or command-line apps for most of my work.

4/3/2009
Ideas for improving data curation:

curation tool - create standalone perl utilities and macros in OpenOffice that relieve Marie from some of the tedium of adding new data
crowd sourcing - can we use socialnetworking to have the community provide assist in data curation (e.g. mirroring MITOMAP tables in Mitowiki)
mining - create programs to extract data from current sources and to derive new information
standardization - any possibility of implementing a data standard that would allow automated extraction of data from published works?

11/14/2008
Updating the 'Previously Reported' value working for my test data. Some concern over the number of operations being performed by checking a variant. May require optimization for full data set. Maybe begin by sorting the rows of the sheets. Maybe call the number format conversion function at the beginning.

11/12/2008
Searching for new variants seems to be working. It was desirable to check these spreadsheets: polymorph, mMut, rtMut, somat, unpub, ins, del. However, due to the non-standardized way in which the ins and del entries are represented, these sheets are not searched.

Another GUI window called mitocurator was added. This GUI window will serve as the gateway to all curation tools. The GUI interface and underlying software will be called "mitocurator", while the spreadsheet document has been renamed to mitomap (although with a .ods file extension). "Openoffice mitomap" will represent the spreadsheet-managed version of MITOMAP data, and will replace "mitomap Excel".

11/7/2008
Directly editing the database doesn't seem like a good idea. Instead, move the rebuilding activity into mitocurator as a "syncing" function. How best to implement syncing? Maybe do a basic syncing based on the maximum value of the primary key, and re-implement the rebuild script for open office.

Skype call with Marie:

reference identifiers for "Previously Reported" variants should indicate the variant category
if not found in the database, than "Previously Reported" should give feedback of "Not currently in MITOMAP"
Submit button should give feedback about what was done
Variants need a "notes" field that is for internal use only
aachange column currently has a misleading representation of 'noncoding', generate values which are descriptive of the transcript product
Clinical Fields:
- Disease: dropdown of diseases currently in the database with option to add
- Conservation: dropdown of H, M, L, NR, +, -
- Controls: textbox
- Homoplasmic: dropdown of +, -, NR
- Heteroplasmic: dropdown of +, -, NR
- Status: dropdown of everything currently in the database with option to add

11/6/2008
Basic layout of the GUI for mitocurator looks okay. Have decided it would be best if we could ditch the spreadsheets and edit the database directly, but unsure of the technical difficulties. Get some feedback from Marie about the new mitocurator layout.

11/5/2008
Explored database interaction with OO. Bain and Pitonyak are both great sources of information. New book about database development from a guy at Stanford looks like a must have. Have OO making database queries, but haven't tried DML. How do database forms differ from dialogs?

11/3/2008
Skype call with Marie:

mitomap_prototype and mitoextractor work on her computer
add two generated fields to the right side of the variation_curation GUI: Variant ID# and Ref Info. Ref Info should be a listing of all the references that have been associated with the reference and should begin with the Ref ID#, but also have a short descriptor, similar to a citation entry.
Variants being added are either new or previously seen. If previously seen, an entry may exist in the polymorphism sheet or any of the clinical sheets. All of these sheets should be checked, and any entries are reported back to the curator.
Curating reference is mostly a cut n' paste operation from "biglist", but a macro should handle the generation of new ref ID#s. Can this be imbedded in the cell of the spreadsheet?
Curating clinical variants is a very similar process, only more information is added. Ideally, the same tool could be used by having a "clinical button" that would act as a toggle for showing additional fields in the data entry.
Priorities: 1) curating regular polymorphisms (Marie is waiting for this), 2) curating references, 3) curating clinical variants, 4) extracting data from phylotree and other sources.
Direct editing of the database seems worth exploring. At the very least mitocurator should be enabled to make database calls. Marie would like the ability to submit a Genbank number and have that sequence automatically retrieved and entered into the database. Ultimately, we need to associate variants with all of the sequences in which they are found. These associations should be generated within the database and presented within MITOMAP and other places in CMEM Web.
When entering a new variant: if the variant is not found, then populate the "Reference ID" field with the most recent entry of the Reference spreadsheet. The value populated should be composed of a prefix that is the reference ID# followed by a short human readable citation descriptor (e.g. "2300 - Wallace and Mishmar 2005"). This value can either be accepted by the curator or an alternative reference ID# can be entered. The mitocurator program will use the ID# prefix for associating the variant with a reference and ignore the rest of the value submitted.

10/31/2008
Began adding some rCRS data structures in the mitocurator program. Getting the reference nucleotide works good, but still need to automate updating the field in the GUI. Also need to generate some sort of data structure that will index all the coding changes. Alternatively, this could be done with a database call, but two versions of this structure (Perl and BASIC) would be very useful. Maybe even javascript versions of all these as we start implementing some AJAX. Some of the functions stored in my Bioinformatics OO library, so mitocurator probably won't run on Marie's computer without transferring these libraries.

Added amino acid names to tool tip, but they all appear on one line. Doesn't seem to be any way of adding a line break in OO tool tips.

10/30/2008
Added functions to Spreadsheet library. 'getSheet' works well. More functions needed. Some snippets that will be useful for mitocurator:

oCell.Value = now ' use this for updating modification dates
oSheet.getRows.removeByIndex( 6, 3 ) ' use this to delete rows
oRow = oSheet.getRows.getByIndex( 0 ) ' get a row
oColumn = oSheet.getColumns.getByIndex( 1 ) ' get a column

Basic GUI for mitocurator is in mitomap_prototype. Values are being passed back. mitocurator should be aimed at managing the most tedious parts of MITOMAP curation, presumably polymorphisms. Other functionality might be added in the future. Aim at doing a good job with well-defined functionality, learn some more OO, and build my foundation libraries. Get input from Marie regarding the interface.

10/29/2008
Got a prototype mitoprocessor dialog object in OO. How to change the drop down box list?? Polish this a bit more and get Marie's input.

10/28/2008
pp works well for packaging mitoextractor into an excutable. Attached protoype mitoextractor file selection executable. Next steps: Begin designing a GUI Dialog in OO for inputting new polymorphisms. Begin adding file validation code to mitoextractor.

10/27
Strawberrry Perl with wxCocoaDialog works on my XP parallels, though the wxCocoaDialog does not seem to be a full port. Remember to install wxCocoaDialog into the C:\wxCocoaDialog directory on Marie's computer to keep the paths the same. Unfortunately, Platypus only bundles programs for distribution on a Mac and there does not seem to be a Windows counterpart. Use pp instead. Would be nice to be using ActiveState's developer tools in a situation like this. Next steps: Get a basic file selection script implemented in perl with wxCocoaDialog and packaged with pp.

I	Attachment	Action	Size	Date	Who	Comment
odt	AndrewMacro.odt	manage	543 K	21 Jan 2009 - 22:56	UnknownUser	Draft of a book from Andrew Pitonyak - lots of useful material
pdf	Linux.com____OOo_Creating_a_simple_application_launcher.pdf	manage	66 K	22 Jan 2009 - 01:09	UnknownUser	Nice article on creating a simple OO GUI application.
xls	PhylotreeVariants.xls	manage	199 K	22 Oct 2008 - 21:36	UnknownUser	Phylotree variants harvested by Marie.
pdf	StarOffice_7_Basic_Programmers_Guide.pdf	manage	1 MB	21 Jan 2009 - 22:54	UnknownUser	Star Office BASIC Documentation
pdf	StarOffice_Programmers_Tutorial.pdf	manage	1 MB	21 Jan 2009 - 22:52	UnknownUser	Star Office programming tutorial
txt	create_oo_refseq_variables.pl.txt	manage	413 bytes	31 Oct 2008 - 23:21	UnknownUser	Perl script for creating refseq data structures in OO BASIC
ods	haplogrid.ods	manage	9 K	22 Jan 2009 - 00:29	UnknownUser	OO Spreadsheet with the haplogrid function.
ods	lymphoblast_pre-typing_Haplotyping_Filter_Report.ods	manage	18 K	23 Jan 2009 - 00:22	UnknownUser	Sample GeneMap data for testing the GeneMap analysis.
odb	mito_database.odb	manage	32 K	15 Sep 2009 - 23:21	UnknownUser	Open Office file that defines the mito database connection. Use this file to register a new data source.
exe	mitoextractor.exe	manage	2 MB	28 Oct 2008 - 20:54	UnknownUser	Mitoextractor packaged into a widows executable.
txt	mitoextractor.pl.txt	manage	287 bytes	28 Oct 2008 - 20:54	UnknownUser	Prototype mitoextractor
ods	mitomap.ods	manage	127 K	15 Nov 2008 - 01:19	UnknownUser	New OO version of mitomap. Replaces the Excel version and contains the mitocurator software.
xls	mitomap.xls	manage	8 MB	22 Jan 2009 - 01:12	UnknownUser	Sample data file derived from Marie's actual Mitomap data spreadsheet.
htm	mtDNA_tree_Build_2.htm	manage	2 MB	22 Jan 2009 - 01:14	UnknownUser	phylotree
fasta	rCRS.fasta	manage	16 K	22 Jan 2009 - 01:15	UnknownUser	Revised Cambridge sequence
htm	references_Build_2.htm	manage	49 K	22 Jan 2009 - 01:14	UnknownUser	phylotree reference list

Topic revision: r1 - 12 Feb 2016, MartyBrandon

MITOWIKI

POLG Server
MitoScape

Tools
Help
Search
Index

Service provided by the

Center for Mitochondrial
& Epigenomic Medicine
at the Children's Hospital
of Philadelphia

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback