Research in FL&C

This page list the main research lines investigated within our group.

PULP LLVM Compiler

The PULP Project is a major effort aimed at developing a new ultra-low power clustered multi-core platform for smart cameras and other embedded applications requiring significant performance, based on open source hardware and software. The project is supported by a multi-national team from ETH Zurich, Universita' degli Studi di Bologna, Politecnico di Milano, Commissariat à l’énergie atomique et aux énergies alternatives (CEA). At Politecnico di Milano, we develop the LLVM compiler support (C/C++) for the PULP core (OpenRISC) as well as OpenCL support for the PULP platform. Within this framework, we have several opportunities for master thesis work.

OpenCL Support for PULP

The goal of this thesis is to extend and optimize OpenCRun, an open source OpenCL runtime developed at Politecnico di Milano, to allow the efficient execution of OpenCL programs on the PULP platform. OpenCRun is based on the LLVM compiler framework, and is currently available for x86 NUMA multicores as well as for the explicitly managed memory many-core platform P2012, providing a good starting point for the port to PULP.

Clang/LLVM fractional arithmetics

Fractional arithmetics are a key feature to support numerical applications such as image analysis in low-power devices where hardware floating point units would be impossible to support. LLVM and its frontend, Clang, do not currently support the Embedded C standard, ISO/IEC TR 18037, which provides fractional arithmetics. The goal of the thesis is to introduce this support, targeting in particular the OR10n OpenRISC processor, which is the processing element employed in PULP.

A benchmark suite and library for embedded computer vision

Current computer vision libraries such as OpenCV provide reference implementations of algorithms tuned to execution on desktop machines endowed with efficient floating point units. The goal of this thesis is to build a computer vision library and benchmark suite employing fractional arithmetics instead of floating point, and targeting low-power embedded platforms.

Automated support for floating point to fixed point conversion

The goal of this thesis is to develop a tool to help the programmer in developing efficient applications employing real numbers in embedded systems. Based on a static analysis of the code (already developed in part), and on user requirements on the data value ranges expressed as pragmas, the tool will adapt the original code to use the least expensive alternative to floating point arithmetics satisfying the user requirements on output precision.

Contacts

Giovanni Agosta Gerardo Pelosi Alessandro Barenghi Michele Scandale Michele Beretta (mail: name.surname@polimi.it)

· %2014/%10/%20 %20:%Oct · Giovanni Agosta

Programming Models and Languages for Parallel Architectures

We study new programming models and languages for emerging architectures. The initial focus was on specific applications languages and models, targeting scalar operand networks (SON) and SIMD architectures. We currently focus on multicore architectures, striving to design a flexible but effective programming language based on OpenCL, CUDA and other industrial efforts.

Bibliography

  1. A. Di Biagio, A. Barenghi, G. Pelosi and G. Agosta. Design of a Parallel AES for Graphics Hardware using the CUDA framework. To appear in the 5th International Workshop on Security in Systems and Networks (SSN2009), Rome, May 2009.
  2. G. Agosta and G. Pelosi. A Domain Specific Language for Cryptography. In proceedings of the Forum on specification and Design Languages (FDL 07), Barcelona, Sep 2007.
  3. G. Agosta, L. Breveglieri, G. Pelosi and M. Sykora. Programming Highly Parallel Reconfigurable Architectures for Symmetric and Asymmetric Cryptographic Applications. In Journal of Computers, Vol 2, No 8, 2007, Academy Publisher.
  4. G. Agosta, L. Breveglieri, G. Pelosi and M. Sykora. Programming Highly Parallel Reconfigurable Architectures for Public-Key Cryptographic Applications. In proceedings of the 4th International Conference on Information Technology : New Generations (ITNG 2007), Las Vegas, Apr 2007.

Code Generation for a Runtime Configurable Loop Coprocessor

The idea of parallelizing loops using Single Kernel Modulo Scheduling on a high specialized coprocessor is attractive. We use a multiclustered Scalar Operand Network as coprocessor, where the topology of the net is reconfigurable.

Since the cluster assignment and the scheduling passes are performed statically, the compiler has to generate the reconfiguration instructions which adapt at the best, at run-time, the topology of the net to the currently scheduled DDG.

This project aims at investigating how to design an architecture which schedules successfully the large part of a program, exploting also the reconfigurability for both the goals to guarantee scalability and improve the quality of the schedule.

Goals achieved so far are the proposal of a novel approach to attack the problem of Instruction Partitioning over a Hierarchical Machine Model and its implementation in a Clustering Tool – which will become part of an industrial compilation toolchain for DSPFabric, a multicluster architecture designed by STMicroelectronics, Grenoble.

Bibliography

  1. Martino Sykora, Davide Pavoni, Joel Cambonie, Roberto Costa, Stefano Crespi-Reghizzi: “Hierarchical Cluster Assignment for Coarse-Grain Reconfigurable Coprocessors”. IPDPS 2007, Long Beach, Mar 2007.
  2. M. Sykora. PhD Dissertation, Politecnico di Milano, 2007

Past Research Topics

A description of some of our past research topics.

Visual Event Grammars

In this project, we proposed a new simple formal method, which combines various features such as modularity, code generation and automatic verification, to give a scalable notation to specify, design, validate and verify GUIs. Our approach, called Visual Event Grammars (VEG) is based on decomposing the specification of a large GUI into communicating automata. Breaking a complex scene down into communicating pieces may dramatically diminish the number of states, as shown by popular notations such as Statecharts.

Loop Parallelization by means of Affine Transformations

This project aims at building a set of tools that will allow the user to extract independent threads of computation (i.e., parallelism at a coarse grain) within loops, according tho the affine transformation framework.

The main contribution of this work will be the derivation of formal algorithms that will allow more parallelism to be extract than what is currently possible with state of the art tools.

Bibliography

  1. Anna Beletska. “Extracting coarse-grained parallelism with the Affine Transformation Framework and its limitations”. In Electronic Modelling, no.5, (2006)
  2. Anna Beletska, Pierluigi San Pietro. “Using the Affine Transformation Framework for Computer Simulation”. In Proceedings of International Conference on Advanced Computer Systems (ACS'2006)

JIT Compilation for the .NET framework

The ECMA specification of a bytecode language and the necessary runtime (.NET) features several improvements over the Java Bytecode language. This project aims at exploring the structure of a CIL to native (Pentium family and ARM) dynamic compiler, with focus on the intermediate representation and the opportunity of offering a standard interface for optimization plugins, as well as the optimization of the compilation and execution process for multi-processor or multi-threaded architectures.

This work is based in part on Portable.Net, a framework for the compilation and execution of applications for the Common Language Infrastructure, entirely composed of Free Software.

The first releases of this project can be found at the Sourceforge project page.

Bibliography

  1. S. Campanoni, M. Sykora, G. Agosta and S. Crespi Reghizzi. Dynamic Lookahead Compilation. To appear in Compiler Construction 2009, Mar 2009.
  2. S. Campanoni, G. Agosta and S. Crespi Reghizzi. A parallel dynamic compiler for CIL bytecode. In proceedings of the 16th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Oct 2008.
  3. S. Campanoni, G. Agosta, and S. Crespi Reghizzi. A parallel dynamic compiler for CIL bytecode. SIGPLAN Not. 43, 4 (Apr. 2008), 11-20
  4. Simone Campanoni, Giovanni Agosta, Stefano Crespi Reghizzi A parallel dynamic compiler for CIL bytecode, Technical Report, DEI – Politecnico di Milano, 2008-3.
  5. S. Campanoni, G. Agosta, and S. Crespi Reghizzi: ILDJIT: A parallel dynamic compiler, VLSI-SoC, Oct. 2008.
· %2013/%04/%18 %03:%Apr

Algebraic properties of structured context-free languages

The historical research line on the algebraic properties of structured CF languages initiated by McNaughton’s Parentheses Languages has recently attracted much renewed interest with the Balanced Languages, the Visibly Pushdown Automata languages (VPDA), the Synchronized Languages, and the Height-deterministic ones. Such families preserve to a varying degree the basic algebraic properties of Regular languages: boolean closure, closure under reversal, under concatenation, and Kleene star. We prove that the VPDA family is strictly contained within the Floyd Grammars (FG) family historically known as operator precedence. Languages over the same precedence matrix are known to be closed under boolean operations, and are recognized by a machine whose pop or push operations on the stack are purely determined by terminal characters. We characterize VPDA’s as the subclass of FG having a peculiarly structured set of precedence relations, and balanced grammars as a further restricted case. The non-counting invariance property of FG has a direct implication for VPDA too.

Bibliography
  1. Stefano Crespi Reghizzi and Dino Mandrioli. Algebraic properties of structured context-free languages: old approaches and novel developments, Technical Report, Apr 2009 (pdf)

Two-dimensional Languages and Picture Grammars

We have studied existing models of grammars for specifying 2D patterns and introduced more refined models based on tiling and on rewriting rules. We have characterized the capacity to generate different types of patterns, and the time complexity for pattern recognition.

Bibliography

  1. M. Pradella and S. Crespi Reghizzi: A SAT-based parser and completer for pictures specified by tiling, Pattern Recognition, Volume 41, Issue 2, February 2008, Pages 555-566
  2. S. Crespi Reghizzi and M. Pradella:A CKY parser for picture grammars, Information Processing Letters, Volume 105, 213-217, March 2008.
  3. A. Cherubini, S. Crespi-Reghizzi and M. Pradella; Regional languages and tiling: a unifying approach to picture grammars, MFCS 2008, Torun, August 2008.

Consensual Languages

This new approach to formal language definition aims to model situation where several computational processes must agree in order to recognize a string as valid. Although using as basic device a finite deterministic machine, the consensual model is able to define a broader family of languages than the regular ones. while preserving polynomial time complexity.

Bibliography

  1. S. Crespi Reghizzi and P.L. San Pietro, Consensual definition of languages by regular sets, LATA 2008, Tarragona, March 2008.
· %2013/%04/%18 %03:%Apr

Maximal Program Parallelization Within Trace Theory

We believe Trace Theory offers great potential for modelling concurrent and parallel computation. In order to assess its current development, we contribute to the ESF Workshop on Developments and New Tracks in Trace Theory, Cremona, Italy, 9-11 October 2008.

Our work aims at finding new algorithms for program parallelization by exploiting theoretical properties of trace languages, a class of languages that suitably model the dependencies between code instructions.

In particular, for a subclass of “well structured” programs, we are able to efficiently decide whether a sequence of instructions is an admissible re-scheduling of a given execution. We are now investigating how to extend this result to a wider class of programs.

A Java Virtual Machine for Micro-Edition CLDC

This project aims at providing a free (GPL'd) Virtual Machine based on the Java Micro Edition specification. This Virtual Machine, Jelatine, is characterized by its highly efficient use of memory, as well as the ability to scale from 16 bit processors up, and to be ported on most architectures that support ANSI C.

The current release, Jelatine 0.8, features interpreted execution only, but the virtual machine is geared towards JIT compilation. The Jelatine package and its documentation can be found at the Sourceforge project page.

Jelatine is developed and maintained by Gabriele Svelto.

Bibliography

  1. Gabriele Svelto. The Jelatine JVM, Whitepaper.
  2. Giovanni Agosta, Stefano Crespi Reghizzi and Gabriele Svelto. Jelatine: A Virtual Machine for small embedded systems. In proceedings of the 4th Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2006), Paris, Oct 2006.
  3. Gabriele Svelto. A Java virtual machine implementation for small embedded systems. Laurea thesis, Politecnico di Milano, July 2006

Dynamic Compilation for VLIW Processors

This project aims at exploring the possibilities of applying JIT translation of Java Bytecode (and related virtual machine instructions) on systems based on VLIW processors, exploiting the instruction-level parallelism as much as possible.

In particular, we implemented JIST, a scheduling JIT compiler, based on the open source Java Virtual Machine implementation Kaffe, and targeted to the Lx platform, a family of Very Long Instruction Word processors jointly developed by Hewlett Packard and STMicroelectronics. Within this framework, three main issues are considered: register allocation, instruction scheduling (both local and global), and memory disambiguation.

We are also investigating the impact of selective compilation and optimization on the performances of our virtual machine.

Bibliography

  1. G. Agosta, S. Crespi Reghizzi, D. Domizioli and M. Sykora. Global Instruction Scheduling in Dynamic Compilation for Embedded Systems. In proceedins of the 4th Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2006), Paris, Oct 2006.
  2. Giovanni Agosta, Stefano Crespi Reghizzi, Paolo Palumbo, Martino Sykora. “Selective Compilation via Fast Code Analysis and Bytecode Tracing”. In 21st ACM Symposium on Applied Computing, April 2006, Dijon, France
  3. Giovanni Agosta, Stefano Crespi Reghizzi, Gerlando Falauto, Martino Sykora. “Just-In-Time Scheduling Translation for Parallel Processors”. In Scientific Programming 3(13), 2005, IOS Press
  4. Dario Domizioli. “Superblock Scheduling in Just-in-Time Compilation of Java Bytecode for a VLIW processor”, Politecnico di Milano, July 2005
  5. Paolo Palumbo. “Profile-Guided Selective Run-Time Compilation of Java Bytecode”, Politecnico di Milano, July 2005
  6. Gerlando Falauto, Martino Sykora. “Run-time compilation and scheduling of Java Bytecode for a VLIW machine”, Politecnico di Milano, December 2003
  7. Giovanni Agosta. “Dynamic Compilation for Architectures with Instruction-Level Parallelism”. Doctoral dissertation, Politecnico di Milano, March 2004
  8. Giovanni Agosta, Stefano Crespi Reghizzi, Gerlando Falauto, Martino Sykora. “Just-In-Time Scheduling Translation for Parallel Processors”. In 3rd International Symposium on Parallel and Distributed Computing (ISPDC 2004), July 2004, Cork, Ireland
· %2013/%04/%18 %03:%Apr

Code Generation for a Runtime Configurable Loop Coprocessor

The idea of parallelizing loops using Single Kernel Modulo Scheduling on a high specialized coprocessor is attractive. We use a multiclustered Scalar Operand Network as coprocessor, where the topology of the net is reconfigurable.

Since the cluster assignment and the scheduling passes are performed statically, the compiler has to generate the reconfiguration instructions which adapt at the best, at run-time, the topology of the net to the currently scheduled DDG.

This project aims at investigating how to design an architecture which schedules successfully the large part of a program, exploting also the reconfigurability for both the goals to guarantee scalability and improve the quality of the schedule.

Goals achieved so far are the proposal of a novel approach to attack the problem of Instruction Partitioning over a Hierarchical Machine Model and its implementation in a Clustering Tool – which will become part of an industrial compilation toolchain for DSPFabric, a multicluster architecture designed by STMicroelectronics, Grenoble.

Bibliography

  1. Martino Sykora, Davide Pavoni, Joel Cambonie, Roberto Costa, Stefano Crespi-Reghizzi: “Hierarchical Cluster Assignment for Coarse-Grain Reconfigurable Coprocessors”. IPDPS 2007, Long Beach, Mar 2007.
  2. M. Sykora. PhD Dissertation, Politecnico di Milano, 2007
· %2013/%04/%17 %03:%Apr
research/index.txt · Last modified: 2014/10/20 20:12 by agosta
Recent changes RSS feed Creative Commons License Donate Driven by DokuWiki