-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathabstracts.html
executable file
·894 lines (892 loc) · 47.7 KB
/
abstracts.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Bioinformatics Open Source Conference (BOSC)</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<link href="/bosc2004/styles-site.css" rel="stylesheet" type="text/css" />
<link href="styles-site.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="header"> Abstract Submissions</a></div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Arek Kasprzyk</p>
<p align="left" class="title">Title: BioMart - a federated query architecture</p>
<p align="left" class="blogbody"><a href="http://www.ebi.ac.uk/biomart">BioMart</a> is a simple, query-oriented
data integration system based on distributed data warehousing ideas.
It offers a flexible, fast and practical data-mining framework for
computer-savvy bioinformaticians as well as life scientists without any
programming experience. Originally developed as EnsMart for Ensembl, it
has now been successfuly applied to a variety of biological databases,
which can be accessed via the web and standalone interfaces.
<br />
<br />
The BioMart suite consits of a relational database schema specification,
an XML-based configuration system, administration tools for configuring
and deploying BioMart databases, and data access software written in perl
and java. A universal, query-optimised database schema, coupled with
domain-agnostic software are responsible for the key features of the
BioMart system: generic applicability, large query network-scalability and
RDBMS-platform portability. Thus, the system can be readily deployed
to provide a unified set of query interfaces to datasources residing
anywhere on the available network. In addition, simultaneous querying of
multiple data sources spread over any number of servers is supported
via query-chaining.
<br />
<br />
BioMart is an OpenSource project and all software is licensed under LGPL.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Chris Mungall</p>
<p align="left" class="title">Title: BioMake: Functional Logical Task Management for Bioinformatics</p>
<p align="left" class="blogbody">A recurring pattern in bioinformatics architectures is the build
pattern, or pipeline. This can be defined as a computational
specification or template defining a collection of interdependent
tasks. Examples include biological sequence analysis pipelines and
data transformation pipelines (import and export of flatfiles, XML and
reports to and from relational databases).
<br />
<br />
Approaches range from the lightweight and generic to heavy duty
frameworks honed specifically for bioinformatics compute pipelines. An
example of the former is UNIX Makefiles, which is a configuration of
tasks where some files must be updated automatically from other files
whenever the other files change, and is primarily used for program
compilation. Examples of the latter include object-oriented systems
such as BioPipe, which are tightly integrated with the BioPerl
library.
<br />
<br />
For our in-house task management we required something similar to
Makefiles in terms of level of abstraction and simplicity, yet without the
limitations of Makefiles and related systems (ant, scons, build, etc). In
particular we needed:
<br />
<br />
- Asynchrnonous task management on compute farms<br/>
- Choice of either relational database or filesystem for storing build
targets<br/>
- A cleaner specification language<br/>
- Fully programmable logic within the Makefile specification<br />
<br />
<br />
Our solution "BioMake" covers these requirements. It uses a declarative
language based around the concept of <i>skolem functions</i>. Each task in
the pipeline is specified as a function construct; for example, in a
genomic compute pipeline there may be function constructs "blastx(Seq,DB)"
and "genscan(Seq)". Each function construct represents a unique and
persistent identifier for the output of an executable. Functions can be
nested; for example "genscan(repeatmask(gi2177872))" represents the
results of running Genscan on a particular RepeatMasked sequence.
Dependent tasks are also specified as functions, and variable unification
is used as an alternative to Makefile-style pattern matching. Actions can
be parameterized using functions and variables. Functions are evaluated to
locators of the target data; for example, a filesystem path, or primary
key value in a database.
<br />
<br />
The task management engine is implemented in Prolog, and pipeline
specifications can use the Prolog code to provide full
programmability. Prolog is a declarative logic language and is
particularly suited to Makefile-style logic. However, the pipeline
programmer does not need to know Prolog in order to
construct or understand useful protocols.
<br />
<br />
The intention is to allow simple and concise specification of complex
pipelines. BioMake requires no object-oriented programming, and is not
tied to any particular language. We provide example customizable
compute pipelines which utilise standard bioinformatics analysis
programs such as BLAST, and infrastructure programs such as the
Apollo Bop parser, XSLT transforms and scripts using BioPerl.
<br />
<br />
More information on the system underlying BioMake can be found <a href="http://skam.sourceforge.net"> here</a>
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Lincoln Stein</p>
<p align="left" class="title">Title: GMOD: The Generic Model Organism Database Project</p>
<p align="left" class="blogbody">The Generic Model Organism Database (GMOD) Project is an open source
project to develop a complete set of software for creating and
administering a model organism database. Components of this project
include genome visualization and editing tools, literature curation
tools, a robust database schema, biological ontology tools, and a set
of standard operating procedures. This project is funded by the NIH
and the USDA Agricultural Research Service, with participation from
members of several database projects, including WormBase, FlyBase,
Mouse Genome Informatics, Gramene, the Rat Genome Database, TAIR,
EcoCyc, and the Saccharomyces Genome Database.
<br />
<br />
Released modules include Chado, a flexible modular relational schema
for genome information, Apollo, a genome feature editor and curator's
tool, GBrowse, a flexible web-based genome browser, Textpresso, a
paper indexing and search tool, the PubSearch/PubFetch literature
curation tools, and Caryoscope, a gene expression visualization
tools. Over the next year we will be releasing more components,
ultimately creating a model organism database construction set.
<br />
This talk will survey the released and pending GMOD tools, and
describe how they can be used for a variety of large and small
projects. The project URL is <a href="http://www.gmod.org"> http://www.gmod.org</a>
<br />
<br />
GMOD is released under a variety of Open Source licenses, primarily
the Perl Artistic License and GNU GPL.</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Ewan Birney</p>
<p align="left" class="title">Title: Ensembl - a portable Genome toolkit</p>
<p align="left" class="blogbody"> Ensembl is a genome information system designed for handling large
genomes, in particular human, mouse and other vertebrates. Its major code
bases can be broken down into three sections: a core relational schema and
API, a computational pipeline system and a user-friendly web site. The
Ensembl system has been designed principally to enable biologists to use
vertebrate genomes, but the source code of Ensembl is open source and
there has been increasing modularisation and clean-up of the system. This
means that Ensembl software has become increasingly useful as toolkit
itself for other genomes: we currently know of at least 8 genomes that
have been loaded and displayed using the Ensembl software outside of the
main Ensembl group.
<br />
<br />
I will present the aspects of Ensembl which are most open to reuse, in
particular how to load and run a new genome into Ensembl from existing,
flat file annotation, and sense of how to extend Ensembl, either using the
configureable DAS protocol or via schema additions. I will also briefly
outline the main concepts behind the pipeline.
<br />
<br />
License: BSD-style.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Toshiaki Katayama</p>
<p align="left" class="title">Title: BioRuby + KEGG API + KEGG DAS = wiring knowledge for genome and
pathway</p>
<p align="left" class="blogbody"> We have been developed BioRuby, a bioinformatics library for Ruby
language, which enable users to write analysis pipeline easily. Here we show the recent developments and how to
integrate BioRuby with KEGG web services (API and DAS) to automate your genome and pathway analysis procedure.
note KEGG API is a SOAP/WSDL based web service providing genes and pathway information. KEGG DAS is also a web
service providing genomic sequences and gene annotations via DAS protocol. Both services are also developed
by us and KEGG (Kyoto Encyclopedia Genes and Genomes) is freely accessed at <a href="http://www.genome.ad.jp/kegg/"> http://www.genome.ad.jp/kegg/</a>
<br />
<br />
* a URL for the project page, if applicable
<br />
<a href="http://bioruby.org/">BioRuby</a><br />
<a href="http://www.genome.ad.jp/kegg/soap/">KEGG API</a><br />
<a href="http://das.hgc.jp/">KEGG DAS</a><br />
<br />
* information about the open source license used for your software or
your release plans.
<br />
LGPL
<br />
<br />
On behalf of BioRuby project,
Toshiaki Katayama
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Brian O'Connor</p>
<p align="left" class="title">Title: Turnkey, a Generic Data Visualization Tool</p>
<p align="left" class="blogbody"><a href="http://sf.net/projects/turnkey">Turnkey </a>is a generic engine for creating a fully functional website automatically. The project uses another open source project, <a href="http://sf.net/projects/sqlfairy"> SQLFairy</a>, to create a directed graph representation of SQL schema. It then uses a combination of Template Toolkit template files and the SQLFairy output to create an autogenerated website based on the underlying schema. This process is generic for any database and the resulting website is highly customizable through CSS and overriding layout templates. The GMOD project is currently using Turnkey to display model organism genome databases.<br />
An example (displays best in mozilla) site can be found <a href="http://monkey.wooly.org/db/Feature/419602?skin=gmod_web"> here</a> or <a href="http://monkey.wooly.org/db/Feature/419602?skin=ensembl"> here </a> for a different skin.
<br />
<br />
The data being displayed is the latest GMOD database with the human
genome loaded. Currently, the CSS based skins have only been tested on
Mozilla, so if it looks horrible in IE I apologize. My next priority is
to ensure general cross-browser compatibility.
<br />
<br />
License:
Turnkey source code is currently available via cvs on <a href="http://sf.net">http://sf.net</a>.<br />
Future releases will be open source, most likely following the GPL.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Hidemasa Bono</p>
<p align="left" class="title">Title: SayaMatcher</p>
<p align="left" class="blogbody">SayaMatcher is a pipeline for matching short but meaningful
DNA elements in a genome scale. Various programs in EMBOSS package are used for the calculation.
The results are exported as LDAS format to be available in DAS-compatible genome browsers.
<br />
<br />
URL: <a href="http://kishoi.jp/SayaMatcher/">http://kishoi.jp/SayaMatcher/ </a> (under construction)
<br />
<br />
Availability: After the publication, it is planed to be freely
available.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Frank Gibbons</p>
<p align="left" class="title">Title: BioGraphNet, a distributed forum for heterogeneous biological networks.</p>
<p align="left" class="blogbody">Biological network information is increasingly abundant. The combination
of biological networks may be viewed as a multicolor graph, with each color
representing a different gene-gene or protein-protein relationship, e.g.,
protein interaction, sequence homology, correlated expression,
transcriptional regulation, genetic interaction (sensu synthetic
lethality), or metabolic relationship. Relationship types may be further
stratified by type of evidence supporting the relationship, by
directionality or by confidence measure. Furthermore, each organism has
its own collection of networks. Although this information's complexity
argues for its maintenance by distributed groups, much of its value is
derived through network integration.
<br />
<br />
BioMOBY has established a 'playground' for distributed services. We have
developed a 'sandbox' within BioMOBY called BioGraphNet. BioGraphNet is a
common standard and collection of services for sharing distributed network
information. We now serve several network data types, and encourage others
to participate, using the common standard objects we have registered in
BioMOBY's ontology.
<br />
<br />
As an example application illustrating the use of BioGraphNet, we provide
BioTrawler, a web-based biological network browser that dynamically
discovers suitable distributed data sources within BioGraphNet, integrates
those selected by the end-user 'just in time', and visualizes the graph
neighboring a user-defined set of genes. Graph layout is handled by the
open-source GraphViz package (modified to handle multiple edges between a
pair of nodes). BioTrawler also exports graph representations in several
commonly used formats (GIF by default, but also Pajek, Cytoscape, and PDF).
<br />
<br />
The combination of BioMOBY and BioGraphNet represents a distributed network
annotation system analogous to the Distributed Annotation System (DAS) for
sharing genome annotation.
<br />
<br />
LICENSING:
The object descriptions and service interfaces are already available to all
BioMOBY users. We plan to release BioTrawler under the Artistic License.
<br />
<br />
URL: <a href="http://llama.med.harvard.edu/cgi/BioTrawler"> http://llama.med.harvard.edu/cgi/BioTrawler </a>
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Brook G. Milligan</p>
<p align="left" class="title">Title: A Multiplatform Scientific Computing Environment</p>
<p align="left" class="blogbody">Bioinformatics is one of the fastest growing disciplines within
biology and computer science, and a proliferation of valuable
software is one of the most obvious results. In order to track
these developments, scientists are increasingly forced to manage
large and complex systems of application software, almost always
in ad hoc and platform specific ways. A system that can readily
manage the complex dependencies among software packages and can
ease the burden of installation and management is sorely needed.
Ideally, such a system would operate on a diversity of computing
platforms. This paper describes our integration of a variety of
bioinformatics software tools and scientific computing
applications into a common management environment. Notably, the
system is useable on at least 18 distinct operating system /
platform combinations. As a result, scientists are able to
easily install and manage a huge and diverse, yet wholly
integrated, array of software, while simultaneously retaining
flexibility in the choice of primary computing environment. Our
work with integrating approximately 100 major software packages
covering traditional bioinformatics (e.g., sequence analysis and
phylogenetic inference) computational biology broadly defined
(e.g., spatial modeling), and other scientifically useful
capabilities (e.g, teleconferencing) indicates that the effort
required is relatively little and greatly eases the burden of
creating scientific computing platforms.
<br />
<br />
Our work will ultimately be distributed from <a href="http://www.netbsd.org/Documentation/software/packages.html"> here</a>; currently, some of it is only available from <a href="http://pkgsrc-wip.sourceforge.net/"> here</a> and more will be deposited there shortly.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Levinson, Gene (NIH/NCI)</p>
<p align="left" class="title">Title: caBIOperl: A new Perl API to the NCI's biomedical domain object middleware</p>
<p align="left" class="blogbody">A reality of the bioinformatics community, and one of its strengths, is its
diversity, including the range of programming languages that are utilized. However, this poses an accessibility problem
for federated web-based resources, unless the APIs and databases can be readily accessed by diverse software development
languages. The U.S. National Cancer Institute Center for Bioinformatics (NCICB) addresses this issue by providing a
diversified set of open-source application programming interfaces to its caCORE system. These interfaces, part of the
object-oriented middleware component known as caBIO, allow developers to write caCORE-powered applications using their
choice of a native Java API, a SOAP-XML API, or even a simple HTTP-XML interface.
<br />
<br />
Each of these APIs delivers the same data and conforms to the same domain object model.
<br />
<br />
Since caBIO was first released, Perl programmers have found it rather inconvenient to access the caCORE system because
(1) they have to package their search criteria in SOAP or HTTP format and send the request to the caCORE server via
the respective protocol; and (2) they have to parse the returned XML to extract the information they need. This has
proven burdensome. For this reason we undertook the development of a new Perl API, recently released and named caBIOperl.
<br />
<br />
The caBIOperl is completely object-oriented. It provides an abstraction layer from SOAP and XML, so that Java programmers
will be working with caBIO objects, similar to what a Java programmer experiences with the native caBIO Java API.
<br />
<br />
caBIOperl wraps the lower-level SOAP and DOM packages, and thus shields the developer from needing to understand SOAP
or parse the XML. The first public release came out in April, 2004, and provides query access to 32 caBIO objects,
including ClinicalTrialProtocol, Pathway, and Gene.
<br />
<br />
caBIOperl thus provides native Perl access that allows developers to customize queries according to the specialized needs
of their local investigative teams. caBIOperl modules can be downloaded from the caBIO section of the <a href="http://ncicb.nci.nih.gov/download">NCICB download site</a>.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Steve Fischer</p>
<p align="left" class="title">Title: GUS - A Functional Genomics Infrastructure System</p>
<p align="left" class="blogbody">The Genomics Unified Schema (GUS) is a functional genomics
infrastructure system in use at about 20 projects across approximately a
dozen institutions. GUS was developed at the Computational Biology and
Informatics Lab (CBIL) as the infrastructure for <a
href="http://plasmodb.org/">PlasmoDB</a> , <a
href="http://www.cbil.upenn.edu/EPConDB/">EPConDB</a> and <a
href="http://www.allgenes.org/">AllGenes</a>. Over the last year we
have packaged GUS for distribution and moved its development to open
source which has resulted in an active user and development community.
<br />
<br />
GUS includes a relational schema with more than 400 tables and views
covering approximately 50 functional genomics concepts. The schema is
organized into five name spaces. DoTS covers the central dogma (genes,
RNAs, proteins); sequence and features; reagents, including clones,
mapping and gene traps. RAD covers microarray experiments in a
MIAME-compliant representation. TESS covers transcription region
regulation; SRes covers controlled vocabularies, including about a dozen
standards-based vocabularies and ontologies. Finally, Core covers
non-biological concepts used to track users and data.
<br />
<br />
Upcoming schema expansion includes additional technologies (2-D gel and
mass spectrometry, in situ hybridizations) that will make use of common
experimental design and sample tables currently residing in the RAD
schema. We plan to work with emerging standards efforts for these
domains paralleling our involvement in the MGED effort for microarray
experiment information.
<br />
<br />
GUS also provides an application framework that includes a Perl and Java
object-relational layer; a Data Load API; many "plugins" to load
standard data sources; a Pipeline API to specify analysis protocols; and
a Web Development Kit (WDK). The WDK assists in the development of
data-mining oriented websites such as <a
href="http://plasmodb.org/">PlasmoDB</a>. It provides a servlet
framework, a declarative format to specify queries, results and records,
page layout, many sample queries and query result caching. The next
generation WDK is under development in collaboration with the <a
href="http://www.genedb.org">GeneDB</a> project at the Pathogen
Sequencing Unit of the Sanger Center, and uses a Struts and JSP based
model-view-controller design.
<br />
<br />
GUS runs under Linux, Tomcat and Oracle. PostgreSQL compatibility is
near completion. The source is freely available.
<br />
<br />
Homepage: <a href="http://www.gusdb.org">www.gusdb.org</a>
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Damian Gessler</p>
<p align="left" class="title">Title: Semantic MOBY as a World Wide Web architecture for bioinformatic interoperability</p>
<p align="left" class="blogbody">MOBY is an open source project for achieving interoperability in bioinformatics.
Research and development has proceeded along a dual-development track that consists of MOBY Services (with an emphasis on
SOAP technologies in a web services model) and Semantic MOBY (with an emphasis on RDF/OWL-DL in a semantic web model).
Semantic MOBY is designed specifically to operate in a nebulous and ever-changing world. In Semantic MOBY we identified
three problems that are hindering widely deployable, scalable interoperability, namely the: i) fatal mutability of
traditional interfaces (if a provider changes its interface, client code depending on that interface fails en masse);
ii) rigidity and fragility of static classification schemes (changing the properties of a class near the root
of an inheritance hierarchy simultaneously affects the entire sub-tree); and iii) confounding structure and content
(content is entangled with the presentation layer and/or implicit behaviors of the presentation software).
<br />
<br />
Addressing these problems essentially recasts the problem of interoperability from being one of simply specifying a
syntax and messaging layer for syntactically connecting clients and providers via information in a registry look-up, to
being one of providing clients and providers a way to semantically describe their data and identify data relevant to
them. Our measure of success is to build an architecture that delivers: i) a common syntax; ii) a shared semantic and
mechanism for semantic negotiation; iii) a discovery mechanism. This talk presents the Semantic MOBY architecture and
API and shows how this is accomplished.
<br />
<br />
Website: <a href="http://www.biomoby.org">www.biomoby.org</a>
<br />
<br />
Open Source License: Artistic PERL
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Thomas Down</p>
<p align="left" class="title">Title: BioJava</p>
<p align="left" class="blogbody">BioJava is a pure Java framework which is useful for developing a wide
range of bioinformatics software, from small research scripts to
complex interactive applications. It includes powerful object models
for handling sequence and other kinds of biological data, and tools for
integrating and querying this information. It also provides a solid
foundation for developing novel analysis methods. General-purpose
implementations of techniques such as Hidden Markov Models and support
vector machines are included in the package.
<br />
<br />
BioJava was first released over four years ago. It is now an
established project and is widely used and supported around the world.
Significant improvements in the past year include the addition of a
data model for 3D structure information, better database support, and
improvements that make BioJava more powerful in a distributed computing
environment.
<br />
<br />
I will be talking about the status of the BioJava project and the kind
of problems for which it has proven useful, discussing its future
directions, and considering the issues involved in maintaining a large
software library.
<br />
<br />
URL: <a href="http://www.biojava.org/">http://www.biojava.org/</a>
Licence: LGPL
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Matthew Pocock</p>
<p align="left" class="title">Title: Taverna: Workflow Enactor for Bioinformatics</p>
<p align="left" class="blogbody">Taverna is a workflow enactor and graphical workflow editor, customised
for bioinformatics applications. Taverna is developed as part of MyGRID,
and is able to coordinate workflows over a wide range of services,
including Emboss tools (via soap lab), SOAP services and MOBY services.
<br />
<br />
Taverna is distributed under lGPL, and hosted on <a href="http://taverna.sourceforge.net/">sourceforge</a>.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Peter van Heusden</p>
<p align="left" class="title">Title: Applying software validation techniques to Bioperl</p>
<p align="left" class="blogbody">With computer software playing an increasingly pervasive role in
society, the risks associated with software failures have begun
receiving more attention. Infamous examples of such software failures
include the loss of the Mars Climate Orbiter (a victim of a metric vs.
imperial unit conversion error) and the fatal overdoses administered by
the Therac-25 medical accelerator (caused by an integer overflow). Even
when not catastrophic, software failure can be extremely costly: the US
Commerce Department's National Institute of Science and Technology
(NIST) estimated in 2002 that poor-quality software costs US businesses
nearly $60 billion per year.
<br />
<br />
Concern about the costs and other risks of software failure has led to
increasing interest in 'software validation'. The US FDA defines
software validation as "confirmation by examination and provision of
objective evidence that software specifications conform to user needs
and intended uses, and that the particular requirements implemented
through software can be consistently fulfilled." In the commercial
world, this process of examination and evidence gathering tends to be
specified by formal procedures (e.g., TQM and ISO 9001) applied in the
context of formal software development methodologies.
<br />
<br />
In the open source world, collaborative development makes formal
procedures hard to apply. Instead, open source projects rely on "many
eyes mak[ing] all bugs shallow" (Eric S. Raymond). Unfortunately,
however, in a large project like Bioperl, not all components are used
equally frequently, and thus not every component is examined equally
thoroughly or often.
<br />
<br />
In order to remedy these shortcomings of the open source development
process, a systematic approach is needed. The existing code, tests and
documentation must be examined from the point of view of validation,
allowing us to bridge the gap between cooperative development (open
source), and the more formal, contractual space of commercial
development.
<br />
<br />
We have established a validation process and applied it to Bioperl. The
resulting validation framework has been developed in such a way that it
can be applied readily to other open source projects (e.g. Biojava).
The validation process, including documentation, Bioperl code changes
and novel test code developed will be described, as well as the overall
quality, reliability and usability improvements that result. We aim to
demonstrate how validation of Bioperl significantly increases its value
for all stakeholders.
<br />
<br />
LICENSING:
The Bioperl project addressed in the talk is licensed under the Perl
Artistic License, an accepted open source license according to the Open
Source Initiative. The work performed by Electric Genetics, as
described in the talk, results in two outcomes:<br />
1) ongoing contributions to the Bioperl suite, including improved
error handling, bug fixes and code additions. These all fall under the
Perl Artistic License and will form significant contributions to the
open source project.<br />
2) commercial documentation and validation suite, offered to clients
as a commercial product. The documentation will be provided to paying
clients on a commercial basis and, thus, will not be immediately placed
in the Bioperl repository. The validation suite will be retained by
Electric Genetics and validation services offered to clients. If a
client wishes to purchase the validation suite, it will be licensed
using a commercial license.<br />
<br />
<br />
The business and licensing model we describe is similar to that of e.g.
Novell, who offer both commercial products (e.g. the Linux admin product
Red Carpet) as well as ongoing contributions to open source projects.
<br />
<br />
PROJECT URL:
<a href="http://www.egenetics.com/opensource.html">http://www.egenetics.com/opensource.html</a>
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Mark Poolman</p>
<p align="left" class="title">Title: Metabolic Modelling of 'Omic-Scale Systems</p>
<p align="left" class="blogbody">Metabolic modelling efforts to date have primarily been concentrated on
relatively small (~10s of reactions) systems. The increasing
availability of annotated genomes and related data, means that the
poential exists to reconstruct the metabolism of a particular organism
directly from such sources. My talk will introduce some of the
principles of metabolic modelling and our experiences in integrating
such techniques. The modelling software we use is developed in house - ScrumPy,
metabolic modelling in Python, and available under GPL. The
bioinformatics components are still under development and not yet
publicly available (but might be by the time of the conference).
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Jason E. Stewart</p>
<p align="left" class="title">Title: Model Compilers for informatics</p>
<p align="left" class="blogbody">Many bioinformatics projects have very similar needs - they need a
relational DB to store information, they need annotation facilities to
add meta data to the data, they need a data transmission format, and
they need a programming interface to query the DB. We have explored
the use of Model Compilers to automatically build project SF directly
from Object Models - either using the universal modelling language
(UML) for the MAGEstk project or using custom XML-based models for the
Genex project.
<br />
<br />
URL: <a href="http://genex.sf.net/">http://genex.sf.net/</a><br />
and:<br />
<a href="http://mged.sourceforge.net/software/index.php">http://mged.sourceforge.net/software/index.php</a>
<br />
<br />
License: Perl Artistic License (for the genex model compiler), and the BSD license (for the MAGEstk model compiler).
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Jason E. Stewart</p>
<p align="left" class="title">Title: Using OpenOfice.org as an analysis front end for bioinformatics</p>
<p align="left" class="blogbody">OpenOffice.org (OO.o) is a powerful Open Source office environment
that includes a spreadsheet application that provides multiple
scripting languages (currently StarBasic, Perl, and Python are
supported - the R language is supported via a bridge from either Perl
or Python) as well as Open Database Connectivity (ODBC). Using these,
the genex project has begun using OO.o as an analysis front end to our
microarray gene expression DB by directly querying the data from OO.o,
loading the query results into a spreadsheet, and running R-based data
processing and analysis tools, and then saving the results back to the
DB.
<br />
<br />
We believe this approach will be widely applicable to many other
informatics projects.
<br />
<br />
URL: <a href="http://genex.sf.net/">http://genex.sf.net/</a>
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Chad Matsalla</p>
<p align="left" class="title">Title: Model Centric Architecture</p>
<p align="left" class="blogbody">As bioinformatics matures, the nature of bioinformatics software
development projects is changing. Existing projects and new efforts are
choosing to employ software engineering principles in lieu of ad hoc
development. These software engineering principles are being employed
because they represent best practices for creating reliable, stable, and
maintainable software. A significant aspect of the application of these
software engineering principles is the creation of models that describe
the domain of the software project.
<br />
<br />
Models carry numerous benefits to the software development and
maintenance lifecycle. Models communicate system behaviour, they allow
developers and maintainers to visualize and control the architecture of
a software system, they provide clear and effective artifactual
representation of the software system and domain, and they provide the
ability to develop software components that mimic elements of the
problem domain. Unified Modeling Language (UML) is the most widely
accepted language for expressing object-oriented analysis and design
decisions.
<br />
<br />
A system is described in which the design of models that accurately
describe the bioinformatics domain plays a central role in the complete
life-cycle of the software engineering process. Given a set of models,
this system generates software components that are designed to be
interoperable in a web services framework regardless of the languages
used to provide or consume the service.
<br />
<br />
An example is provided in which Bioperl objects are passed to a Java
client. This Java client deserializes the Bioperl objects and displays
the payload.
<br />
<br />
A second example is provided demonstrating a software application
implemented in Java Swing in which a large, complex object is passed to
and from a web service implemented in Perl using Bioperl.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: James Gilbert</p>
<p align="left" class="title">Title: The Otter Annotation System</p>
<p align="left" class="blogbody">The <a href="http://vega.sanger.ac.uk">VEGA database</a> presents high quality manual annotation of finished vertebrate
genomes. Until recently the finished clones that constitute the tiling path of the chromosome were annotated individually. Tags in the data objects that represented parts of RNA transcripts that span several clones were used to describe how
they should be fused. Fusing occurred during a conversion process that created an Ensembl database containing the
complete gene structures.
<br />
<br />
The otter project was developed in order to present the annotator with a view of a contiguous region of a chromosome made
from several clones, and to avoid the conversion step by storing the annotation directly in an Ensembl database.
<br />
<br />
The gene annotation data is passed between the annotation client and Ensembl database server in an XML format. The XML
contains the clone assembly information along with the gene structure data. It is hoped that the XML format will be
adopted as an exchange format by other centers who wish to display their annotation in VEGA.
<br />
<br />
The otter schema is an extension of the Ensembl database SQL schema. Additional tables store textual information about
transcripts, genes and clones added by the annotator, implement a clone level locking mechanism, and keep track of the
authors of particular annotations. These are accompanied by corresponding additions to the Ensembl Perl API. A
lightweight HTTP server written in perl, otter_srv, exchanges XML with the client and saves the annotator's changes
to the MySQL otter database in a single transaction.
<br />
<br />
The annotators' graphical interface, otterlace, now incorporates a number of improvements, such as the display
of gapped alignments of sequence database hits to the genomic sequence.
<br />
<br />
The core otter software is available, under the same licence as <a href="http:// www.ensembl.org/code_licence.html">Ensembl</a>, by anonymous CVS (package ensemblotter) from cvs.sanger.ac.uk, where it will be joined by the otterlace client
software. It is anticipated that a packaged distribution will also be created. The code is already in use by
some of our collaborators outside the Sanger Institute.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Peter Rice</p>
<p align="left" class="title">Title: EMBOSS: The European Molecular biology Open Software Suite</p>
<p align="left" class="blogbody">EMBOSS started as an open source sequence analysis package and now
extends into protein structure, phylogenetics and other areas. A key
feature is the ease of integrating EMBOSS into other interfaces (web, GUI, SOAP, workflows, etc.)
<br />
<br />
URL: <a href="http://www.emboss.org/">http://www.emboss.org/</a>
<br />
<br />
Licence: GPL (and LGPL for the libraries and for associated packages)
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Michel Dumontier</p>
<p align="left" class="title">Title: The NCBI C++ Software Development</p>
<p align="left" class="blogbody">The NCBI is the host and developer of the world's largest bioinformatics projects.
As such, it has developed an extensive, powerful, documented and freely available bioinformatics programming platform
that contains a rich and robust set of functionalities designed to handle the intrinsic complexities of biology.
The NCBI C++ toolkit provides portable application framework classes for argument processing, diagnostics, exceptions,
connection streams, stream wrappers and threads. The C++ code generator tool transforms ASN.1 data specifications
into ready-to-use, error-free set of C++ classes and functions to liberate the programmer from writing class variable
methods while providing garbage collection and object serialization to ASN.1/XML. An object manager facilitates
heterogeneous access to biological sequence data for annotation and display. Moreover, the toolkit offers excellent
support for database independent projects and complex CGI applications. This talk will provide a high-level overview
of the features and tools available in the NCBI C++ toolkit that enable computational investigations in biology
by third-party developers.
<br />
<br />
URL: <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/">http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/</a><br />
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Martin Senger</p>
<p align="left" class="title">Title: Life Sciences Identifiers. Finally?</p>
<p align="left" class="blogbody">Life Sciences Identifiers (LSIDs) are persistent,
location-independent, resource identifiers for uniquely naming
biologically significant resources including but not limited to
individual genes or proteins, or data objects that encode information
about them.
<br />
<br />
Their specification includes not only their syntax but defines also a
set of middleware-independent interfaces for resolving the
identifiers, and allowing access to their associated metadata (such as
annotations).
<br />
<br />
The LSID Assigning service is responsible for creation of LSIDs for
given data entities.
<br />
<br />
URL:
<a href="http://www.omg.org/cgi-bin/doc?lifesci/03-12-02">http://www.omg.org/cgi-bin/doc?lifesci/03-12-02</a><br />
<a href="http://www-124.ibm.com/developerworks/oss/lsid/">http://www-124.ibm.com/developerworks/oss/lsid/</a><br />
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Bob Freeman</p>
<p align="left" class="title">Title: MAGIC Tools: A Suite of Programs to Aid in Gene Discovery and Expression Analysis via EST/Genome Sequencing and Microarray Analysis</p>
<p align="left" class="blogbody">The rapidly increasing rate at which biological data is being produced requires a concomitant development of relational
databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe
here a Modular Approach to a Genomic, Integrated, and Comprehensive (MAGIC) Database, and to associated Analysis and
Visualization Tools. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via
production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed both by
EST clustering and microarrays.
<br />
<br />
The Gene Discovery portion of the system focuses on information derived from DNA sequences. Aside from the Seq-LIMS
and Admin portions to support wet-lab activities, administration, and sequence processing, this portion supports working
with and viewing information about sequences and clones (Pipeline and SeqView), clustering via TGICL or via our novel
algorithm Olympiad (Cluster), automatically annotating genes and clusters via BLAST or BLAT (Annotation), and discovering
and classifying SNPs and microsatellites (Polymorphism).
<br />
<br />
The Microarray portion is a MIAME-compliant database with two components at present. These are Array-LIMS and
Array-DataManager, which make possible remote entry of all information into the database, and Array-Analysis, which
provides data mining and visualization. Spot calling and normalization are done externally through modular libraries,
allowing other tools to be used as preferred. Data are visualized through Spotfire and Spotfire server, though again
other tools may be used. Data in Spotfire server are linked real-time to MAGIC Database to provide on-demand information
about interesting data points.
<br />
<br />
Finally, because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only
for individual research laboratories, but also for core facilities that serve clients at any distance.
<br />
<br />
We plan for MAGIC to be an Open Source project before the end of this calendar year. Releases will be made either via
SourceForge or Bioinformatics.Org. Please see <a href="http://fungen.org">http://fungen.org</a> for more details.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Francois Pepin</p>
<p align="left" class="title">Title: BIAS: Bioinformatics Integrated Application Software</p>
<p align="left" class="blogbody">We introduce a Java open source development platform entitled Bias
(Bioinformatics Integrated Application Software) especially tailored to
Bioinformatics research and software development.
<br />
<br />
Bias aims to provide a rich toolkit for carrying out integrative
research addressing issues of data warehousing, data inter-operability
and the use of probabilistic learning
strategies such as Bayesian networks. It allows third-party tools to be
easily incorporated within the system, and it supports standards and
data-exchange protocols common to Bioinformatics including, for example,
the MIAME standard for gene expression data, R, and BioJava.
<br />
<br />
Bias is built upon an object-relational strategy thus allowing for all
of the positive aspects of both relational database systems and
object-oriented languages. In particular it allows for a consistent data
model that can be easily extended to include new objects and relations
in an automatic way.
<br />
<br />
The main project website can be reached at <a href="http://www.mcb.mcgill.ca/~bias/">http://www.mcb.mcgill.ca/~bias/</a> with username: biasweb, password: TheBIASpassword. It will be available to the public shortly.
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: Henning Hermjakob</p>
<p align="left" class="title">Title: The PSI MI standard - open analysis of protein interaction data</p>
<p align="left" class="blogbody">The HUPO PSI protein interaction work group has jointly developed an XML
standard for the representation of protein interaction data, the PSI MI
format. PSI MI data is now available from major interaction data
providers, including DIP, MINT, and IntAct. Based on the PSI MI
standard, database and analysis tools from different providers can be
joined to efficiently analyse and manipulate protein interaction data.
We will present the IntAct, an open source protein interaction database
and analysis tool which provides extensive PSI MI support. The web
interface provides both textual and graphical representations of protein
interactions, and allows exploring interaction networks in the context
of the GO annotations of the interacting proteins. IntAct is Java-based,
with Jakarta OJB object-relational mapping to Postgres or Oracle. PSI MI
upload and download are possible as well as dynamic access to
interaction networks by a web service or search URL. The direct URL
access allows to directly access and further analyse PSI MI data in the
open source tools ProViz and Cytoscape. These, in turn, provide a choice
of fast network visualisation algorithms, integration with expression
data, path finding and clustering in interaction networks.
<br />
<br />
Project URLs:<br />
<a href="http://psidev.sf.net">http://psidev.sf.net</a><br />
<a href="http://intact.sf.net">http://intact.sf.net</a><br />
<a href="http://www.cytoscape.org">http://www.cytoscape.org</a><br />
</p>
</div>
</div>
<div id="entry">
<div class="blog">
<p align="left" class="title">Author: michael watson</p>
<p align="left" class="title">Title: Systems Biology Integration</p>
<p align="left" class="blogbody">Systems Biology can be defined as the use of skills in mathematics and computer
science to integrate disparate sets of data to produce greater understanding of biological systems, and is a key
component of predictive biology. Scientific organisations need reliable and scalable informatics solutions to
enable research into systems biology. Here we present how freely available open source projects and software can be
integrated to produce a sophisticated bioinformatics platform, where microarray data and genomic sequence can be
integrated with functional annotation and predictive tools to identify groups of co-regulated genes and biological
pathways in host-pathogen interaction systems.
</p>
</div>
</div>
</body>
</html>