Master of Science in Computer Science

Series

Master of Science in Computer Science

Permanent Link

https://hdl.handle.net/1853/73328

Series Type

Degree Series

Associated Organization(s)

Organizational Unit

School of Computer Science

Full item page

Publication Search Results

Now showing 1 - 10 of 17

Mitigating Racial Biases in Toxic Language Detection

(Georgia Institute of Technology, 2022-05-05) Halevy, Matan

Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-remediation techniques, propagate racial biases even in a larger corpus. We then propose a novel ensemble-framework that uses a specialized classifier that is fine-tuned to the African American English dialect. We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets. We demonstrate how the ensemble framework improves fairness metrics across all sample datasets with minimal impact on the classification performance, and provide empirical evidence to its ability to unlearn the annotation biases towards authors who use African American English. ** Please note that this work may contain examples of offensive words and phrases.
Virtual Reality as a Stepping Stone to Real-World Robotic Caregiving

(Georgia Institute of Technology, 2021-05-04) Gu, Yijun

Versatile robotic caregivers could benefit millions of people worldwide, including older adults and people with disabilities. Recent work has explored how robotic caregivers can learn to interact with people through physics simulations, yet transferring what has been learned to real robots remains challenging. By bringing real people into the robot's virtual world, virtual reality (VR) has the potential to help bridge the gap between simulations and the real world. In this thesis, we present Assistive VR Gym (AVR Gym), which enables real people to interact with virtual assistive robots. We also provide evidence that AVR Gym can help researchers improve the performance of simulation-trained assistive robots with real people. Prior to AVR Gym, we trained robot control policies (\emph{Original Policies}) solely in simulation for four robotic caregiving tasks (robot-assisted feeding, drinking, itch scratching, and bed bathing) with two simulated robots (PR2 from Willow Garage and Jaco from Kinova). With AVR Gym, we developed \emph{Revised Policies} based on insights gained from testing the Original policies with real people. Through a formal study with eight participants in AVR Gym, we found that the Original policies performed poorly, the Revised policies performed significantly better, and that improvements to the biomechanical models used to train the Revised policies resulted in simulated people that better match real participants. Notably, participants significantly disagreed that the Original policies were successful at assistance, but significantly agreed that the Revised policies were successful at assistance. Overall, our results suggest that VR can be used to improve the performance of simulation-trained control policies with real people without putting people at risk, thereby serving as a valuable stepping stone to real robotic assistance.
Code-Upload AI Challenges on EvalAI

(Georgia Institute of Technology, 2021-05-04) Jain, Rishabh

Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify and foster progress in the discipline. We have developed several tools such as EvalAI which helps us in evaluating the performance of these systems and to push the frontiers of machine learning and artificial intelligence. Initially, the AI community focussed on simple and traditional methods of evaluating these systems in the form of prediction upload challenges but with the advent of deep learning, larger datasets, and complex AI agents, etc. these methods are not sufficient for evaluation. A technique to evaluate these AI agents is by uploading their code, running it on the sequestered test dataset, and reporting the results on the leaderboard. In this work, we introduced code upload evaluation of AI agents on EvalAI for all kinds of AI tasks, i.e.reinforcement learning, supervised learning, and unsupervised learning. We offer features such as scalable backend, prioritized submission evaluation, secure test environment, and running AI agents code in an isolated sanitized environment. The end-to-end pipeline is extremely flexible, modular, and portable which can later be extended to multi-agents setups and evaluation on dynamic datasets. We also proposed a procedure using GitHub for AI challenge creation to version, maintain, and reduce the friction in this conglomerate process. Finally, we focused on providing analytics to all the users of the platform along with easing the hosting of EvalAI on private servers as an internal evaluation platform.
Vulnerabilities in SNMPv3

(Georgia Institute of Technology, 2012-07-10) Lawrence, Nigel Rhea

Network monitoring is a necessity for both reducing downtime and ensuring rapid response in the case of software or hardware failure. Unfortunately, one of the most widely used protocols for monitoring networks, the Simple Network Management Protocol (SNMPv3), does not offer an acceptable level of confidentiality or integrity for these services. In this paper, we demonstrate two attacks against the most current and secure version of the protocol with authentication and encryption enabled. In particular, we demonstrate that under reasonable conditions, we can read encrypted requests and forge messages between the network monitor and the hosts it observes. Such attacks are made possible by an insecure discovery mechanism, which allows an adversary capable of compromising a single network host to set the keys used by the security functions. Our attacks show that SNMPv3 places too much trust on the underlying network, and that this misplaced trust introduces vulnerabilities that can be exploited.
Robust clustering algorithms

(Georgia Institute of Technology, 2011-04-05) Gupta, Pramod

One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across any different fields ranging from computational biology to social sciences to computer vision in part because they are simple and their output is easy to interpret. However, many of these algorithms lack any performance guarantees when the data is noisy, incomplete or has outliers, which is the case for most real world data. It is well known that standard linkage algorithms perform extremely poorly in presence of noise. In this work we propose two new robust algorithms for bottom-up agglomerative clustering and give formal theoretical guarantees for their robustness. We show that our algorithms can be used to cluster accurately in cases where the data satisfies a number of natural properties and where the traditional agglomerative algorithms fail. We also extend our algorithms to an inductive setting with similar guarantees, in which we randomly choose a small subset of points from a much larger instance space and generate a hierarchy over this sample and then insert the rest of the points to it to generate a hierarchy over the entire instance space. We then do a systematic experimental analysis of various linkage algorithms and compare their performance on a variety of real world data sets and show that our algorithms do much better at handling various forms of noise as compared to other hierarchical algorithms in the presence of noise.
Vision-based place categorization

(Georgia Institute of Technology, 2010-11-18) Bormann, Richard Klaus Eduard

In this thesis we investigate visual place categorization by combining successful global image descriptors with a method of visual attention in order to automatically detect meaningful objects for places. The idea behind this is to incorporate information about typical objects for place categorization without the need for tedious labelling of important objects. Instead, the applied attention mechanism is intended to find the objects a human observer would focus first, so that the algorithm can use their discriminative power to conclude the place category. Besides this object-based place categorization approach we employ the Gist and the Centrist descriptor as holistic image descriptors. To access the power of all these descriptors we employ SVM-DAS (discriminative accumulation scheme) for cue integration and furthermore smooth the output trajectory with a delayed Hidden Markov Model. For the classification of the variety of descriptors we present and evaluate several classification methods. Among them is a joint probability modelling approach with two approximations as well as a modified KNN classifier, AdaBoost and SVM. The latter two classifiers are enhanced for multi-class use with a probabilistic computation scheme which treats the individual classifiers as peers and not as a hierarchical sequence. We check and tweak the different descriptors and classifiers in extensive tests mainly with a dataset of six homes. After these experiments we extend the basic algorithm with further filtering and tracking methods and evaluate their influence on the performance. Finally, we also test our algorithm within a university environment and on a real robot within a home environment.
Performance understanding and tuning of iterative computation using profiling techniques

(Georgia Institute of Technology, 2010-05-18) Ozarde, Sarang Anil

Most applications spend a significant amount of time in the iterative parts of a computation. They typically iterate over the same set of operations with different values. These values either depend on inputs or values calculated in previous iterations. While loops capture some iterative behavior, in many cases such a behavior is spread over whole program sometimes through recursion. Understanding iterative behavior of the computation can be very useful to fine-tune it. In this thesis, we present a profiling based framework to understand and improve performance of iterative computation. We capture the state of iterations in two aspects 1) Algorithmic State 2) Program State. We demonstrate the applicability of our framework for capturing algorithmic state by applying it to the SAT Solvers and program state by applying it to a variety of benchmarks exhibiting completely parallelizable loops. Further, we show that such a performance characterization can be successfully used to improve the performance of the underlying application. Many high performance combinatorial optimization applications involve SAT solving. A variety of SAT solvers have been developed that employ different data structures and different propagation methods for converging on a fixed point for generating a satisfiable solution. The performance debugging and tuning of SAT solvers to a given domain is an important problem encountered in practice. Unfortunately not much work has been done to quantify the iterative efficiency of SAT solvers. In this work, we develop quantifiable measures for calculating convergence efficiency of SAT solvers. Here, we capture the Algorithmic state of the application by tracking the assignment of variables for each iteration. A compact representation of profile data is developed to track the rate of progress and convergence. The novelty of this approach is that it is independent of the specific strategies used in individual solvers, yet it gives key insights into the "progress" and "convergence behavior" of the solver in terms of a specific implementation at hand. An analysis tool is written to interpret the profile data and extract values of the following metrics such as: average convergence rate, efficiency of iteration and variable stabilization. Finally, using this system we produce a study of 4 well known SAT solvers to compare their iterative efficiency using random as well as industrial benchmarks. Using the framework, iterative inefficiencies that lead to slow convergence are identified. We also show how to fine-tune the solvers by adapting the key steps. We also show that the similar profile data representation can be easily applied to loops, in general, to capture their program state. One of the key attributes of the program state inside loops is their branch behavior. We demonstrate the applicability of the framework by profiling completely parallelizable loops (no cross-iteration dependence) and by storing the branching behavior of each iteration. The branch behavior across a group of iterations is important in devising the thread warps from parallel loops for efficient execution on GPUs. We show how some loops can be effectively parallelized on GPUs using this information.
Flexible access control for campus and enterprise networks

(Georgia Institute of Technology, 2010-04-07) Nayak, Ankur Kumar

We consider the problem of designing enterprise network security systems which are easy to manage, robust and ﬂexible. This problem is challenging. Today, most approaches rely on host security, middleboxes, and complex interactions between many protocols. To solve this problem, we explore how new programmable networking paradigms can facilitate ﬁne-grained network control. We present Resonance, a system for securing enterprise networks , where the network elements themselves en- force dynamic access control policies through state changes based on both ﬂow-level information and real-time alerts. Resonance uses programmable switches to manipulate traffic at lower layers; these switches take actions (e.g., dropping or redirecting traffic) to enforce high-level security policies based on input from both higher-level security boxes and distributed monitoring and inference systems. Using our approach, administrators can create security applications by first identifying a state machine to represent different policy changes and then, translating these states into actual network policies. Earlier approaches in this direction (e.g., Ethane, Sane) have remained low-level requiring policies to be written in languages which are too detailed and are difficult for regular users and administrators to comprehend. As a result, significant effort is needed to package policies, events and network devices into a high-level application. Resonance abstracts out all the details through its state-machine based policy specification framework and presents security functions which are close to the end system and hence, more tractable. To demonstrate how well Resonance can be applied to existing systems, we consider two use cases. First relates to "Network Admission Control" problem. Georgia Tech dormitories currently use a system called START (Scanning Technology for Automated Registration, Repair, and Response Tasks) to authenticate and secure new hosts entering the network [23]. START uses a VLAN-based approach to isolate new hosts from authenticated hosts, along with a series of network device interactions. VLANs are notoriously difficult to use, requiring much hand-holding and manual configuration. Our interactions with the dorm network administrators have revealed that this existing system is not only difficult to manage and scale but also inﬂexible, allowing only coarse-grained access control. We implemented START by expressing its functions in the Resonance framework. The current system is deployed across three buildings in Georgia Tech with both wired as well as wireless connectivities. We present an evaluation of our system's scalability and performance. We consider dynamic rate limiting as the second use case for Resonance. We show how a network policy that relies on rate limiting and traffic shaping can easily be implemented using only a few state transitions. We plan to expand our deployment to more users and buildings and support more complex policies as an extension to our ongoing work. Main contributions of this thesis include design and implementation of a ﬂexible access control model, evaluation studies of our system's scalability and performance, and a campus-wide testbed setup with a working version of Resonance running. Our preliminary evaluations suggest that Resonance is scalable and can be potentially deployed in production networks. Our work can provide a good platform for more advanced and powerful security techniques for enterprise networks.
Tuned and asynchronous stencil kernels for CPU/GPU systems

(Georgia Institute of Technology, 2009-05-18) Venkatasubramanian, Sundaresan

We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly "fast-and-loose" algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs.
Prediction of secondary structures for large RNA molecules

(Georgia Institute of Technology, 2009-01-12) Mathuriya, Amrita

The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.