USF Department of Computer Science and Engineering College of Engineering

Home Page


Applications


Projects


Schedule


Personnel


Housing and facilities

Maps and directions


Other REU Sites and links


National Science Foundation


This project is supported by the National Science Foundation

the University of South Florida's College of Engineering

and

the Florida Georgia Louis Strokes Alliance for Minority Participation in Science, Engineering and Mathematics (FGLSAMP)


Research Experiences for Undergraduates (REU) Summer Program in Computer Science and Engineering

Research Projects



Description of Projects for Summer 2005

  • Transport Layer Protocols for Wired, Wireless and Optical Networks and Active Queue Management Algorithms for High-Speed Routers



  • Neural Networks and Genetic Algorithms for Mining of Urban Transportation Data



  • Traveling Smart and Enhancing the Ridership Experience



  • Finite Element Breast Models for 3D/2D Fusion of MRI and Mammography



  • Scalable Data Mining Algorithms



  • Generating Self-Similar Network Traffic to Support Simulation Modeling of Packet Switches

  • Design and Evaluation of a File Transfer Program

  • Path Planning, Navigation Controllers and Communication Interfaces for Mobile Robots

  • Computer Engineering projects

  • Transport Layer Protocols for Wired, Wireless and Optical Networks and Active Queue Management Algorithms for High-Speed Routers

    Project Supervisor - Dr. Miguel A. Labrador, PI, Assistant Professor, Department of Computer Science and Engineering, USF.

    Research activities in this area will involve students in the design and evaluation of transport layer protocols (TLP) for different types of networks, including wired, optical and wireless networks, such as satellite, ad hoc, and wireless sensor networks, and active queue management algorithms for high-speed routers.

    Students will do research in TCP and UDP and other TLPs, bandwidth estimation techniques and computational intelligence techniques to improve the performance of current protocols or even design new ones with focus on heterogeneous networks. These enhancements will be evaluated using simulation tools like the Network Simulator 2 and real tesbeds, like the Web100-Dummynet testbed. 

     

    Neural Networks and Genetic Algorithms for Mining of Urban Transportation Data

    Project Supervisor - Dr. Rafael A. Perez, Co-PI, Professor, Department of Computer Science and Engineering, USF and Mr. Phillip L. Winters, Program Director, Center for Urban Transportation Research, USF.

    In collaboration with the College of Engineering’s Center for Urban Transportation and Research (CUTR), Dr. Perez is investigating ways to build more accurate models of how different modes of urban transportation are selected by employees for commuting to work. Such models would allow companies to implement the right mix of incentives that would promote the selection of modes of urban transportation different from single car occupancy - thus reducing air pollution and fuel energy consumption. Large amounts of data expanding several years have been collected by individual companies in cities such as Los Angeles, Seattle, Phoenix, etc, that reflect the effect that different combination of incentive plans have had over time on the number of vehicles arriving at the companies’ worksites. Analysis of this type of data by urban transportation professionals has defied traditional techniques for a number of reasons.

    Different types of neural networks, including back propagation and radial basis, are being investigated as possible solutions to building accurate models from this type of data. Shortcomings and advantages of the different features of these neural networks being tried are being analyzed and quantified. Once built and tested we will assist transportation professionals in field evaluations of these models. In addition to neural networks, genetic algorithms are being investigated as a possible means to generate formulas that capture the relationships between incentives, company characteristics and number of vehicles arriving at the worksite. The accuracy of these genetic algorithms will be quantified and contrasted with the other techniques mentioned here. Evaluation of the formulas generated by the genetic algorithms will be undertaken in conjunction with transportation professionals.

    Traveling Smart and Enhancing the Ridership Experience

    Project Supervisor - Mr. Philip Winters, Center for Urban Transportation Research, USF and Dr. Rafael Perez and Dr. Miguel A. Labrador, Department of Computer Science and Engineering, USF.

    In collaboration with the Department of Computer Science and Engineering, CUTR is investigating new ways to enhance the ridership experience providing services not available to individual users so far. Mr. Winters is leveraging a campus wide wireless network and an intelligent transportation system that USF is installing to investigate travel behavior at the user level and provide real-time feedback to travel in smarter ways. In addition, he is investigating the effect on ridership of providing Internet access and increasing safety in public transportation units.

    The project has a strong programming component in the design and implementation of an electronic diary to collect user travel behavior and provide user feedback. First, this application, designed and implemented on a Personal Digital Assistant, will collect travel information directly from the user and user location from a GPS device. Second, this information will be transmitted using a wireless network (either public or private) to a central location (database). Then, the central location based on stored information and access to other databases will “compute” and optimal travel behavior for the user. Finally, this information will be sent to the user’s PDA on real-time so that he/she can modify his/her travel behavior for good. Important tasks and challenges are the design of the Graphical User Interface for the PDA, the design of the databases and access to existing ones, the algorithms or techniques to “compute” the optimal travel behavior and the use of the wireless network infrastructure to transfer the information in real-time. The project also contemplates the incorporation of this technology into cellular telephones.

    Finite Element Breast Models for 3D/2D Fusion of MRI and Mammography

    Project Supervisor - Dr. Dmitry Goldgof, Professor, Dept. of Computer Science and Engineering, USF.

    Detecting micro-calcifications and masses in mammograms (X-ray images of breast) is critical for early diagnosis of breast cancer and successful treatment. Screening method using one mammogram has two shortcomings: (1) not all parts of breast can be seen and examined in a single mammographic view and therefore, some cancers will be missed because of their locations, (2) since a mammogram is acquired as a result of accumulative projection of X-ray through breast tissue, overlapping dense tissues that are common in young women may block malignant features. Using more than one mammogram obtained from different angles (views) certainly increase the chance of positively identifying breast abnormalities that could otherwise be missed in a single view. However, current two-view methods rely heavily on radiologist's experience and hence mammographic readings are often inconsistent and ambiguous. It is very difficult to fuse information from two different mammograms without good knowledge about the shape and deformation of breast (breast is compressed during X-ray imaging to reduce radiation dosage).

    Finite element method is a powerful numerical technique for modeling complex shapes and deformations. Finite element model has been used to study mechanical behavior of many human organs such as heart, lung, kidney, as well as breast. A finite element breast model will provide three-dimensional information that is essential for linking suspicious features found in two mammographic views. To date, no two-view mammography study has been conducted using the sophisticated finite element modeling technique. Our approach is to use a finite element breast model to facilitate two-view mammographic interpretation. This model-based approach has five main steps:

    1. Finite element model of the breast is constructed using breast MR images.
    2. Breast model is compressed using recorded compression data.
    3. Features (calcification and mass) are identified by radiologists in two mammography views.
    4. Identified mammographic features are back-projected to generate their positions in breast model.
    5. Correlation of mammographic features from two views is determined based on their 3D positions.

    We have access to MR images and mammograms of breast phantom and patient dataset is being collected. In case only mammography data is available (MR imaging is more expensive than mammography), we will use a generic breast model to help two-view mammography reading. If proven effective, the proposed two-view method using a 3D finite element breast model could potentially change the current clinical practice and a large population of women may benefit from the method because of the increased cancer detection rate.

    Scalable Data Mining Algorithms

    Project Supervisor - Dr. Lawrence Hall, Professor, Department of Computer Science and Engineering, USF.

    There are two data mining projects within the Intelligent Systems Laboratory, which have natural research slots for undergraduates. The main thrust of work in this lab is to develop scalable data mining algorithms. We are interested in both unsupervised grouping of data, clustering, and supervised learning for classification. The problems being undertaken would have up to a terabyte of labeled or unlabeled data.

    The problem domains with very large amounts of data, which are easily accessible, are biological data, e.g., the secondary and tertiary structure prediction of proteins, and text classification (ranging from Web based information retrieval to general document grouping). In both problem domains, it is sensible to try to cluster the data because labeling can be both time-consuming and error prone. It is also quite reasonable to use labeled data to build models.

    The protein data bank (PDB) contains over 20,000 proteins for which secondary and tertiary structures are known. New proteins are added all the time. While 20,000 proteins may not sound like much data, predictions are done at the amino acid level and the average protein chain consists of around 150-200 amino acids. So, there are over 3.6 million examples and the dimensionality of them is typically about 300. Similarly, the World Wide Web clearly contains a large number of documents. There are some smaller, but significant size, standard data sets such as Reuters and the newsgroup data, which have over 20,000 documents.

    To deal with very large data sets, we have explored a distributed learning approach. Below we describe a distributed approach to supervised learning as one potential project and an unsupervised clustering approach as another potential project.

    With very large data sets, it is impractical to move data around. Therefore, one might build a distributed classifier where each individual classifier is built on a locally available training data set. These training data sets would be labeled and disjoint with no examples in common. Previous work has shown that it is possible to get accuracy rivaling a single classifier built on all the data in this way. As the next step, we would build a classifier, which can provide probabilistic predictions of how likely any of the ensemble classifiers is to be correct for a particular type of example. That is, this classifier will allow for the selection of a subset of the ensemble to vote for any given example. Such a classifier may need to be built incrementally, such as a naive Bayes classifier or recent work on Support Vector machine classifiers and must produce a likelihood or probability of a given classifier producing a correct answer. It would be incremental if you want to improve over time and will get continued labeled data to predict with. We would do the training with a separate set of training data. With large data sets, it is straightforward to set aside data for this separate training.

    There are challenges with this approach. An ensemble of classifiers could have as many as 1000 classifiers in it. More typically, the number will be around 100. It is likely necessary to have multiple classifiers, which predict the accuracy of a subset of classifiers in the ensemble. It would be difficult to train a classifier to rate 1000 other classifiers accurately, for example. The separate training set could actually be drawn from the overall training set and classifiers given examples for prediction that were not in their training sets. Alternatively, a separate training set can be used. Experiments with the accuracy of both approaches appear appropriate. There is also the issue of what threshold should be used to allow a classifier to get a vote. For example, one could choose to allow classifier to vote anytime it was predicted to be correct with a probability > 50% or one might ask for something like 90%.

    An REU student can work to produce a distributed partition and evaluate how well the merge process works in producing a final partition. They can compare with other algorithms that are designed to cluster with large amounts of data.

    Generating Self-Similar Network Traffic to Support Simulation Modeling of Packet Switches

    Project Supervisor - Dr. Kenneth Christensen, Associate Professor, Department of Computer Science and Engineering, USF.

    Recent work of The Information Systems Laboratory has focused on new switch architectures where simulation modeling is used to evaluate the performance of new switch designs. Most of the simulation modeling work is done using the CSIM18 simulation engine. One architecture of interest is the Combined Input and Crossbar Queued (CICQ) switch in. The performance of switch architectures, such as the CICQ switch, is very sensitive to the input traffic characteristics. A design that works well for simple Bernoulli arrivals with uniformly selected destination port as input may be unstable for bursty or imbalanced traffic loads.

    An undergraduate student funded by this REU proposal would learn the basics of queuing theory, time series characterization, and simulation modeling. The student will be given reading assignments from Molloy’s Fundamentals of Performance Modeling for his or her basic learning. The student will develop traffic generators for generating simulated traces (of interarrival time and packet length) for correlated and self-similar network traffic. To do this, the student will consult the traffic modeling literature and reduce to “C” code some of the already developed methods and mathematical expressions for generating time series with self-similar characteristics. The student will also carefully validate that the implemented traffic generators are correct. For example, the M/G/infinity queue model where the number of customers in residence is asymptotically self-similar will be straightforward to implement and will be the first step. Other approaches include fractional Gaussian noise process and ARIMA-based. Some of these methods are complex from both a conceptual and implementation standpoint. The generators will be used by the graduate students in their switch design simulation modeling and will be made generally available as source code tools via the Christensen tools page. The Christensen tools page is accessed about 25 times per day and students and researchers throughout the world use its tools.

    Design and Evaluation of a File Transfer Program

    Project Supervisor - Dr. Kenneth Christensen, Associate Professor, Department of Computer Science and Engineering, USF.

    Transferring files is an important use of IP networks. We would like the file transfers to be as fast as possible (that is, we want the response time for a file transfer to be as short as possible). To achieve this, the file transfer program and its underlying protocols should maximally utilize the available bandwidth of the network. For this project, the REU student

    will learn sockets programming and how to instrument software programs to measure response time. The student will then develop (at least) two programs. One program will use sockets streams (i.e., TCP) and the other will use sockets datagram (i.e., UDP). Each program will comprise a client and server and be able to transfer a user selected file from the server to client. A example of a client execution is:

     

    c:\work>getfile server_name file_name

    c:\work> transfer of file_name starting

    c:\work> transfer complete in 12.94445 seconds

     

    For the UDP program, the students will investigate new ideas in how to detect and resend lost packets. A starting point may be the NETBLT protocol. The goal is to see if a UDP-based file transfer program can ever be faster than a TCP-based file transfer program. Key to the investigation is for the student to evaluate their programs for both small  and large files and for a variety of networks including high-speed, low-speed, distant, near, wired, and wireless. The student will also use existing data transfer benchmark program to study TCP and UDP performance for bulk data transfer. These existing benchmarks include ttcp and netperf. Of interest to this project is the yearly Internet2 Land

    Speed Record (LSR) contest.

    Path Planning, Navigation Controllers and Communication Interfaces for Mobile Robots

    Project Supervisor - Dr. Kimon P. Valavanis, Professor, Center for Robot Assisted Search and Rescue (CRASAR), Dept. of Computer Science and Engineering, USF.

    The proposed project concentrates in designing simple path planning and navigation controllers and communication interfaces for a family of mobile robot platforms (like the iRobot ATRV). The central idea behind the project is to model (eventually) a team of robots working together, coordinating and executing common tasks as well as communicating with each other in an indoor (first) and outdoor (later on) environment. To accomplish this objective it is required that accurate models for the robot sensor suite be derived, requirements and specifications for the respective communication protocols for ground-to-ground communication be established, and robot dynamic models be defined, before the actual path planning and navigation controllers are designed.

    As a first step, the existing robot hardware and software architecture will be used as a reference point, followed by appropriate modifications to enhance robot flexibility and functionality. The different aspects of the project are complementary with each other, as they aim at the same outcome.

    Students will familiarize themselves with robot systems, experiment and then design novel interfaces improving robot performance. Successful completion of project goals will lead to a deeper understanding of Computer Science and Engineering and Electrical Engineering topics motivating the students to enroll in graduate school and continue research.

    Computer Engineering projects

    Project Supervisor - Dr. Nagarajan Ranganathan, Professor, Dept. of Computer Science and Engineering, USF.

    1. A translator for converting C-code and VHDL-code into their corresponding control and dataflow graphs (CDFGs). We have an in-house software that converts VHDL designs into corresponding CDFGs and we want to make extensions to it to also work for applications written in C. The project requires background in logic design and C programming.

    2. A GUI interface for VHDL-to-CDFG generator software. An efficient GUI interface needs to be designed for an in-house software tool developed for conversion of VHDL code to corresponding Control and data flow graphs (CDFGs).

    3. Integration of GUI interface with a multicrisis management software system. We have an in-house developed multicrisis management software and also a framework for GUI interface which needs to be integrated together so we have nice GUI for the crisis management tool. This is a purely software project needing visual basic.