- Galina Boldina, Chiara Pascali and Martin TeichmannGenome-wide determination of DNA-sequences bound by isoforms of human RNA polymerase III (Pol IIIa and Pol IIIß)In eukaryotes, transcription is carried out by DNA-dependent RNA polymerases I, II and III (or I-V in plants). These RNA polymerases are specialized in the transcription of specific groups of genes. Human RNA polymerase III (Pol III) transcribes small noncoding RNAs involved in the regulation of transcription (7SK RNA), RNA processing (U6 RNA, RNAse P, RNAse MRP), translation (tRNAs, 5S RNA) or other cellular processes (vault RNAs [multidrug resistance], adenoviral RNAs [VA-I, VA-II], Epstein-Barr virus RNAs [EBER1, EBER2]). It has furthermore been reported that some microRNAs of viral or cellular origin may also be transcribed by Pol III.
Interestingly, increased Pol III transcription levels accompany or cause cell transformation. The mechanisms underlying this phenomenon are still largely unknown. Recently, two distinct isoforms of human Pol III have been discovered (Haurie et al., 2010). RPC32ß-containing Pol IIIß is ubiquitously expressed and essential for growth and survival of human cells. In contrast, RPC32a-containing Pol IIIa is dispensable for cell survival and its expression is restricted to undifferentiated embryonic stem (ES) cells and to tumor cells.
The distinct effects of Pol IIIa and Pol III? on cell growth and transformation may be explained by the transcription of isoform-specific target genes. To identify such isoform-specific target genes, we specifically targeted RPC32a and of RPC32ß subunits in chromatin immunoprecipitation (ChIP) experiments and analyzed the co-precipitated DNA sequences by high throughput sequencing (ChIP-seq.).
Genome wide localization of RPC32a and RPC32ß subunits of Pol III revealed the presence of both subunits on many of the known Pol III-transcribed genes, suggesting redundant activities of both isoforms of Pol III in transcription of these genes. We also found that some of the genes known to be transcribed by Pol III are only occupied by either RPC32a or by RPC32ß, suggesting that these genes are exclusively transcribed by Pol IIIa or by Pol IIIß, respectively.
RPC32a and RPC32ß ChIP-seq. results furthermore led to the identification of novel Pol III candidate genes in HeLa cells. Moreover, we found high levels of Pol IIIa or Pol III? at some of the annotated tRNA pseudogenes, implicating that these genes may be transcribed. The functions of RNAs transcribed from novel putative Pol III genes or from tRNA pseudogenes remain to be determined.
- T. Bourquard A.Ghoorah J.Azé A.Poupon D.RitchieCan Voronoi fingerprints re-score and improve rigid body Hex docking predictions?Protein-protein docking procedures normally consist of two successive steps.
Firstly, a search algorithm generates a large number of candidates conformations, and a
scoring function is then used to rank them in order to extract a near-native conformation.
Thanks to modern GPUs, the Hex docking program can now generate and score billions of
candidate conformations to produce a list of few hundred high quality candidate solutions
in just a few seconds. However, the Hex scoring function cannot normally identify a
near-native conformation from this list.
On the other hand, we have demonstrated previously that using Voronoi tesselations of
protein interfaces using relatively small number of surface properties for each Voronoi
cell can be used to construct accurate scoring functions. However, such scoring functions
are not yet sufficiently sensitive for large-scale explorations of the interactome.
Here, we introduce a hierarchical clustering of three-dimensional docking predictions
to detect and discriminate sub-sets of near-native complexes from decoys generated by Hex.
This provides a new way to score docking candidates based on their similarity to the
interface fingerprints of known protein interfaces. Although this is still work in
progress, we expect that this novel Voronoi-based scoring scheme should give better
docking predictions than using conventional shape-based or "knowledge-based" potentials
- Pablo Carbonell, Davide Fichera, Jean-Loup FaulonPredicting heterologous compound-forming reaction pathways through retrosynthesis hypergraphsRetrosynthesis, a widely used technique in chemistry, can be utilized to engineer chassis organisms such as E. coli to synthesize specific target compounds. Our method is based on the representation of metabolic maps as annotated retrosynthesis hypergraphs where substrates, products and reactions are coded into molecular signatures. Substrates and products in this graph are represented by vertices linked by hyperedges. The algorithm of bioretrosynthesis that we have developed searches for heterologous genes and their associated metabolites through the enumeration and ranking of all feasible hyperpaths going from a source set of metabolites to a desired target compound. To help decide which pathways are best to engineer, machine learning is used to mine genomic databases for predicting protein function and enzymes catalyzing specific substrates. The ranking is based on several criteria such as inhibitory effects, cytotoxicity of heterologous metabolites, and host compatibility (codon usage, homology), which are combined with in-house predictive tools (the MolSig package) in order to estimate Quantitative Structure-Activity Relationships (QSAR) for enzyme activity and reaction efficiency at each step of the identified pathways. The Bio-RetroSynth web server is an online tool for visualizing the retrosynthesis graph, which integrates novel biosynthesis pathway predictions with genomics, proteomics and metabolic information.
- M. Chavent, A. Vanel, A. Tek, B. Levy, B. Raffin, M. BaadenA rendering Method for small molecules up to macromolecular systems: hyperballs accelerated by graphics processorsRay-casting on Graphics Processing Units (GPUs) opens new possibilities for molecular visualization. We describe the implementation and calculation of diverse molecular representations such as licorice, ball & stick, space-filling van der Waals spheres and solvent accessible surfaces using GPUs. We introduce an improved ball & stick representation called HyperBalls, replacing tubes linking the atom spheres by hyperboloids that can smoothly connect them. This type of depiction is particularly useful to represent dynamic phenomena, such as the evolution of non covalent bonds. It is furthermore well suited to represent coarse grained models as well as spring networks. We will show that all these representations can be defined by a single general algebraic equation that is adapted for the ray-casting technique and is straightforward to implement on GPUs. Using GPU capabilities, this method can render routinely, accurately and interactively molecules ranging from a few atoms up to huge macromolecular assemblies with more than 500,000 particles. In fortuitous cases we have been able to display up to millions of atoms smoothly.
- Ghoorah A W, Devignes M-D, Smaïl-Tabbone M and Ritchie D WIdentifying interaction sites in protein families for knowledge-based protein dockingProtein-protein interactions (PPIs) play a key role in many cellular processes. In order to understand these processes, we need to be able to analyse and model them at the molecular level. There now exist several databases which contain almost all of the available experimentally-determined PPIs. We have developed an automated approach to extract and annotate a representative set of protein-protein interfaces for any given protein family using the PDB, PFAM and 3DID databases. For example, we annotate the 3DID interface residues according to their location in the interface (i.e. “core” or “rim”). We also calculate overall residue conservation. We store the calculated information for each interface in our database, KBDOCK. Overall, KBDOCK stores 5010 representative hetero protein-protein interfaces for 1203 protein families. Our approach provides a fast and convenient way to use the “docking by homology” principle to guide and focus protein docking calculations. Using this approach we successfully modelled CAPRI target 40, a complex between an API-A and two trypsins. We are now developing data mining approaches to use our database to identify and annotate protein-protein interaction sites in protein families. These annotations will be useful to guide and focus protein docking calculations, and hence make better docking predictions more quickly and easily.
- E. Laine, I. C. de Beauchêne, C. Auclair, J.-F. Mouscadet, L. TchertanovPropagation of point mutation effect throughout tyrosine kinase c-Kit structure, probed by molecular dynamics structureThe regulation of protein function is underlaid by concerted motions and conformational changes of structural fragments or domains. Such changes arise from information transmission through the residue network of the protein, upon association into macromolecular complexes or interaction with a ligand. Point mutations can also strongly influence the structural and dynamical properties of proteins. Molecular Dynamics (MD) provide a way to characterize such influence and understand the propagation of point mutation effect.
Protein kinases are highly proto-oncogenic and play major roles in all aspects of cellular physiology. In particular, the constitutive activation of the receptor tyrosine kinase c-Kit, mainly due to point mutations, is associated with mastocytosis and gastro-intestinal stromal tumors. Although some treatments exist, primary and secondary mutations can confer resistance to kinase inhibitors. For instance, c-Kit D816V mutation is resistant to imatinib treatment, displacing the equilibrium between the kinase inactive and active states. Here we have investigated the impact of D816V on c-Kit internal dynamics and have attempted to model the transmission of the mutation signal throughout the protein structure.
Combining multiple Molecular Dynamics (MD), Normal Mode Analysis (NMA) and pocket detection, we were able to put in evidence a long-range re-structuring effect of the mutation on the juxta-membrane region of c-Kit, which normally plays a negative regulatory role on the kinase domain. These results enabled us to propose a molecular mechanism for the D816V-induced activation of c-Kit kinase. A transformation of Principal Components Analysis (PCA), called Local Feature Analysis (LFA), was applied to identify local dynamics domains and correlation patterns in the wild-type and mutant forms. We are now trying to infer routes of propagation for the mutation signal by characterizing the interactions between these local dynamics domains. We previously used a similar approach to model the electrostatic effect of calcium ions on the complex formed between Edema Factor of Bacillus anthracis and Calmodulin (Laine et al., Biophys J., 2009).
- Alexis Lamiable, Dominique Barth, Alain Denise, Franck Quessette and Sandrine Vial3D RNA Structure prediction: an algorithmic game-theory approachRNA molecules fold into a 3D shape to perform their biological functions in the cell. There has been research on understanding that folding process since the discovery of the catalytic properties of RNA, but while reliable methods exist for predicting the secondary structure, the tertiary structure remains elusive.
However, in order to predict a function, an approximate shape should be sufficient. We propose a coarse-grain prediction method, using the modular and hierarchical nature of the folding process to divide the prediction in several easier steps.
- Feng Lou and Peter CloteParametric maximum expected accurate RNA structureRecently, it has emerged that RNA secondary structure can be more accurately predicted by computing the Maximum Expected Accurate (MEA) structure, rather than the minimum free energy (MFE) structure, since the former depends on the entire low energy ensemble at thermodynamic equilibrium. We present a new algorithm, which for all values k, computes the MEA(k) structure, i.e. that which has maximum expected accuracy over all secondary structures whose base pair distance from an initial structure S0 equals k. Our algorithm uses McCaskill base pair probabilities, dynamic programming, and backtracking. As well, free energy, Boltzmann probability and an analogue to the partition function are calculated. We compare the MEA(k) structures with the MFE(k) structures, which latter are computed by the program RNAbor (Freyhult, Moulton, Clote); here, the MFE(k) structure has minimum free energy over all structures that differ by exactly k base pairs Our motivation is to use the more easily computed MEA(k) structures to study possible folding pathways, to determine alternate low-energy structures, to predict potential nucleation sites, to explore structural neighbors of an intermediate, biologically active structure, and ultimately to locate riboswitches in the 5' UTR of mRNA.
- Balaji Raman, Cecile Heyvaert, Jean-Marc Steyaert, Peter CloteRiboPythia - Discovering Riboswitches and Secondary Structure PredictionRiboswitches are found to be in the non-coding region of RNAs. These RNAs regulate gene expression, even-though this portion of RNA is non-coding (i.e., they do not code for any proteins). Riboswitches comprises of two parts: aptamer and the expression platform. The aptamer naturally binds to a ligand, and induces structural changes in the expression platform (e.g., of a ligand is TPP-Thymine Pyro Phosphate). The aptamer part of the riboswitch has primary sequence and structure conservation. Expression platform, on the other hand, has low sequence and structure conservation. The expression platform part of the riboswitch is the region directly responsible for the gene regulation. For example, a terminator stem in the expression platform is known to stop transcription there-forth affecting the gene expression. The family of a riboswitch can be identified using the aptamer, and the expression platform determines the mechanism that the specific member of the family is involved in.
Existing software detect the aptamer part of the riboswitch. Databases such as RFAM store just the aptamer part of the riboswitch. We propose a software to detect the complete riboswitch, that is, the aptamer and the expression platform. Existing folding algorithms do not predict the correct structure of the TPP riboswitch. We propose an automatic mechanism to detect and fold the riboswitch correctly.
In this work, we show how to successfully detect several classes of TPP riboswitch family in a given genome sequence. We have manually written a descriptor for these riboswitches. We then feed into an existing tool that accepts this riboswitch descriptor, and efficiently searches for several classes of TPP riboswitch family. The RNA secondary structure folding software allows constraints to be provided for folding a given sequence. We propose an approach to automatically generate the folding constraints for a given RNA sequence. These constraints enforce the RNA folding software to fold the sequence correctly.
Hence given a genome sequence, our riboswitch detection tool, RiboPythia, can find the riboswitch, essentially, the start of the aptamer, and the end of the expression platform. Thereafter our folding part of the computational pipeline will fold the input sequence into a correct structure. This tool hence would be of utmost importance to detect putative riboswitches, and predict the secondary structure so as to understand the mechanism of the riboswitch. We are hopeful that this tool would be useful in understanding the role of the riboswitch in gene regulation and expression. We are already encouraged with the new hits of TPP Riboswitches, which RiboPythia found, in species (where riboswitches have not been found so far). Also, experiments with RiboPythia has also led us to discover new TPP Riboswitch mechanism. These observations, that is, putative riboswitch hits and putative mechanism are pending experimental validation.
- Cédric Saule, Mireille Régnier, Jean-Marc Steyaert, Alain DeniseCounting RNA pseudoknotted structuresIn 2004, Condon and coauthors gave a hierarchical classification of exact RNA structure prediction algorithms according to the generality of structure classes that they handle. We complete this classification by adding two recent prediction algorithms. More importantly, we precisely quantify the hierarchy by giving closed or asymptotic formulas for the theoretical number of structures of given size n in all the classes but one. This allows to assess the tradeoff between the expressiveness and the computational complexity of RNA structure prediction algorithms.
- Mireille Regnier and Saad SheikhClump properties in ROG-LOG graphsGiven a set of words H defined over alphabet V, we study the problem of generating clumps. Clumps are words formed by connecting and extending words from H using overlapping prefixes and suffixes. It has been shown previously that probabilities of generating random words in this manner can be determined using an overlap graph built from an Aho-corasick automata. Here we discuss some more interesting properties that these clumps have and how they can be easily identified in the overlap graph. We discuss the concept of "bad clumps", clumps that are formed by overlapping more than two words at a position and how they can be identified. We also discuss canonical and non-canonic extensions and how they may be identified in the overlap graphs and thus use to optimize the counting algorithms.
- Khemili S., Hamadouche T. et Gilis D.Homology modeling study of 3D structure of families 5 and 21 house dust mite allergensHouse dust mites are the most important source of indoor allergens that can cause allergenic diseases. The knowledge of the 3D structures of allergens allows a better understanding of the biological function and of the epitope regions, responsible for the allergenic response, and helps the design of hypoallergens for allergen-specific immunotherapy.
We focus in this study on family 5 of house dust mite allergens, and on proteins from family 21 that share a sequence identity with family 5 larger than 30%. The biological function of proteins from families 5 and 21 is still unknown, and only few studies have characterised the epitopes of these protein families. The tridimensional structure of these groups 5 and 21 allergens has not been determined experimentally except for Blo t 5 and Der p 5 isolated from Blomia tropicalis and Dermatophagoides pteronyssinus, respectively. The two allergens are formed of three similarly sized ?-helices, which are tightly packed into an antiparallel bundle, but, in contrast to Blo t 5 (PDB code: 2JMH), which is reported to be monomer, Der p 5 is a hexamer, which consists of an imperfect trimer of dimers (PDB code: 3MQ1). Here, we describe the comparative modeling of the structures of Der f 5 and Der f 21 from Dermatophagoides farinae, Sui m 5 from Suidasia medanensis, Lep d 5 from Lepidoglyphus destructor, and Derp 21 from Dermatophagoides pteronyssinusto using Der p 5 and Blo t 5 as templates for the comparative modeling program Modeller. After energy minimization and evaluation, the final models are docked by the rigid body protein-docking program “ClusPro” to predict the protein-protein interaction complexes to gain insight into the mechanism of dimerization. We discuss the ability of these allergens to dimerize by using several score function. A comparison of the bonding modes and the interacting residues along the interface of the generated dimers has been investigated. The results show that the mode of dimerization of Der f 5 could be similar to that of Der p 5. However, the dimerization of Sui m 5, Blo t 5, Lep d 5, Der p 21 and Der f 21 considerably differs from the mode observed for Der p 5 complex.
- Olivier StahlDjeen : a high throughput multi-technological Research Information Management System for the Joomla! CMSThe current growth of high-throughput experimentation in biology research implies a huge challenge to store and share generated data and their annotations. Manipulating and integrating heterogeneous data types in a multidisciplinary project remain an even higher hurdle for which proper management systems must be deployed. Data annotations need to be recorded and homogenized for proper data integration, while respecting Minimum Information standards (MIAMI, MIFlowCyt).
Several databases and LIMS have been previously proposed, usually dedicated to a single technology data management [DNA microarray (BASE, EzArray, Longhorn Array Database), proteomics (ms_lims)]. Minimum Information standards are typically hardcoded and users have to be retrained on new laboratory practices. This prevents interdisciplinary and translational collaborations and the transfer of knowledge among laboratories generating heterogeneous information that need to be integrated. On the bioinformatics development side, using these databases implies understanding complex and mostly unsupported APIs and difficult administrative tasks, which are serious restraints for laboratory deployment.
In contrast with these solutions, we developed the Database for Joomla’s Extensible Engine (DJEEN) system, a multi-technological Research Information Management System (RIMS). It contains a complete pipeline to manage heterogeneous projects and organize experiments in a hierarchy, it allows user and group right management, it features a template manager to create and manage standards (for instance Minimum Information standards), records experimental parameters, and manages multiple types of laboratory files with a simple and unique system. DJEEN is also capable of managing experimental and quality control parameters and clinical information. Focus has been placed user-centric needs such as rapid and coherent annotation of a large set of files and data sharing with collaborators.
This tool was build as a Joomla Content Management System (CMS) component. This allows speeding up development by direct re-use of major Joomla’s features, including, user and right management and web interface. DJEEN can be deployed quickly on a web server running a database and Joomla, which meet administrator needs for quick deployment.
DJEEN is publicly available from http://bioinformatique.marseille.inserm.fr/djeen .
- Stephan VagnerLarge-Scale Integration of microRNA and mRNA Expression Data for Identification of Prognostic Markers and Biological Networks in CancerUnderstanding regulated gene expression is central to providing insights into biological/pathological processes. Whereas much effort has been placed on deciphering transcriptional regulation, little is known about equally important post-transcriptional processes (such as splicing, polyadenylation or translation) that control the metabolism of transcribed mRNAs. Importantly, carcinogenesis is associated with defects in post-transcriptional gene regulation and a crucial role for miRNAs and alternative splicing is now well established in tumorigenesis. However, the intersection between miRNA regulation and transcript diversity has not been thoroughly studied. Using the 4T1 mouse model of metastasis development that reflects the sequence of multistep metastatic progression, we have recently identified about 700 alternative exons (with Affymetrix Exon arrays; Collab: D. Auboeuf’s team, Inserm Lyon; Dutertre et al., Cancer Res. 2010) and 50 miRNAs (with Exiqon
miRNA microarrays) whose expression is deregulated during tumor progression. We now aim to develop computational approaches for integrating these different types of large-scale datasets and for developing tools for the direct inference of posttranscriptional regulatory circuits.
- Philippe Youkharibache, Korbinian Grote, Martin Seifert.Automated ChIP-Seq analysis for high throughput sequencingAs the throughput of the sequencers such as the HiSeq increases and the efficiency of software improves there is a need to automate sequencing data analysis to the maximum extent. Genomatix has developed a completely automated ChIP-Seq workflow that starts from mapped reads to perform peak finding, overrepresentation of transcription factors, and identification of DNA binding sequence motifs, automatically. Here we take a published study on Genome-wide profiling of PPAR:RXR by Nielsen et al. (Genes Dev. 2008;22:2953-67) as an example and report the results, which are in-line with the paper. This demonstrates that one can expect publication quality results in a couple of hours after having obtained the reads from the sequencer.