School of Computer Science Technical Report Series

Series Type
Publication Series
Associated Organization(s)
Associated Organization(s)
Organizational Unit
Organizational Unit

Publication Search Results

Now showing 1 - 6 of 6
  • Item
    Spam or Ham? Characterizing and Detecting Fraudulent "Not Spam" Reports in Web Mail Systems
    (Georgia Institute of Technology, 2011) Ramachandran, Anirudh ; Dasgupta, Anirban ; Feamster, Nick ; Weinberger, Kilian
    Web mail providers rely on users to “vote” to quickly and collaboratively identify spam messages. Unfortunately, spammers have begun to use large collections of compromised accounts not only to send spam, but also to vote “not spam” on many spam emails in an attempt to thwart collaborative filtering. We call this practice a vote gaming attack. This attack confuses spam filters, since it causes spam messages to be mislabeled as legitimate; thus, spammer IP addresses can continue sending spam for longer. In this paper, we introduce the vote gaming attack and study the extent of these attacks in practice, using four months of email voting data from a large Web mail provider. We develop a model for vote gaming attacks, explain why existing detection mechanisms cannot detect them, and develop new, efficient detection methods. Our empirical analysis reveals that the bots that perform fraudulent voting differ from those that send spam. We use this insight to develop a clustering technique that identifies bots that engage in vote-gaming attacks. Our method detects tens of thousands of previously undetected fraudulent voters with only a 0.17% false positive rate, significantly outperforming existing clustering methods used to detect bots who send spam from compromisedWeb mail accounts.
  • Item
    Practical Data-Leak Prevention for Legacy Applications in Enterprise Networks
    (Georgia Institute of Technology, 2011) Mundada, Yogesh ; Ramachandran, Anirudh ; Tariq, Mukarram Bin ; Feamster, Nick
    Organizations must control where private information spreads; this problem is referred to in the industry as data leak prevention. Commercial solutions for DLP are based on scanning content; these impose high overhead and are easily evaded. Research solutions for this problem, information flow control, require rewriting applications or running a custom operating system, which makes these approaches difficult to deploy. They also typically enforce information flow control on a single host, not across a network, making it difficult to implement an information flow control policy for a network of machines. This paper presents Pedigree, which enforces information flow control across a network for legacy applications. Pedigree allows enterprise administrators and users to associate a label with each file and process; a small, trusted module on the host uses these labels to determine whether two processes on the same host can communicate. When a process attempts to communicate across the network, Pedigree tracks these information flows and enforces information flow control either at end-hosts or at a network switch. Pedigree allows users and operators to specify network-wide information flow policies rather than having to specify and implement policies for each host. Enforcing information flow policies in the network allows Pedigree to operate in networks with heterogeneous devices and operating systems. We present the design and implementation of Pedigree, show that it can prevent data leaks, and investigate its feasibility and usability in common environments.
  • Item
    Packets with Provenance
    (Georgia Institute of Technology, 2008) Ramachandran, Anirudh ; Bhandankar, Kaushik ; Tariq, Mukarram Bin ; Feamster, Nick
    Traffic classification and distinction allows network operators to provision resources, enforce trust, control unwanted traffic, and traceback unwanted traffic to its source. Today’s classification mechanisms rely primarily on IP addresses and port numbers; unfortunately, these fields are often too coarse and ephemeral, and moreover, they do not reflect traffic’s provenance, associated trust, or relationship to other processes or hosts. This paper presents the design, analysis, user-space implementation, and evaluation of Pedigree, which consists of two components: a trusted tagger that resides on hosts and tags packets with information about their provenance (i.e., identity and history of potential input from hosts and resources for the process that generated them), and an arbiter, which decides what to do with the traffic that carries certain tags. Pedigree allows operators to write traffic classification policies with expressive semantics that reflect properties of the actual process that generated the traffic. Beyond offering new function and flexibility in traffic classification, Pedigree represents a new and interesting point in the design space between filtering and capabilities, and it allows network operators to leverage host-based trust models to decide treatment of network traffic.
  • Item
    Fishing for Phishing from the Network Stream
    (Georgia Institute of Technology, 2008) Ramachandran, Anirudh ; Feamster, Nick ; Krishnamurthy, Balachander ; Spatscheck, Oliver ; Van der Merwe, Jacobus
    Phishing is an increasingly prevalent social-engineering attack that attempts identity theft using spoofed Web pages of legitimate organizations. Unfortunately, current phishing detection methods are neither complete nor responsive because they rely on user reports, and many also require clientside software. Anti-phishing techniques could be more effective if they (1) could detect phishing attacks automatically from the network traffic; (2) could operate without cooperation from end-users. This paper performs a preliminary study to determine the feasibility of detecting phishing attacks in real-time, from the network traffic stream itself. We develop a model to identify the stages where in-network phishing detection is feasible and the data sources that can be analyzed to provide relevant information at each stage. Based on this model, we develop and evaluate a detection method based on features that exist in the network traffic it- self and are correlated with confirmed phishing attacks.
  • Item
    Building a Better Mousetrap
    (Georgia Institute of Technology, 2007) Ramachandran, Anirudh ; Seetharaman, Srinivasan ; Feamster, Nick ; Vazirani, Vijay V.
    Routers in the network core are unable to maintain detailed statistics for every packet; thus, traffic statistics are often based on packet sampling, which reduces accuracy. Because tracking large ("heavy-hitter") traffic flows is important both for pricing and for traffic engineering, much attention has focused on maintaining accurate statistics for such flows, often at the expense of small-volume flows. Eradicating these smaller flows makes it difficult to observe communication structure, which is sometimes more important than maintaining statistics about flow sizes. This paper presents FlexSample, a sampling framework that allows network operators to get the best of both worlds: For a fixed sampling budget, FlexSample can capture significantly more small-volume flows for only a small increase in relative error of large traffic flows. FlexSample uses a fast, lightweight counter array that provides a coarse estimate of the size ("class") of each traffic flow; a router then can sample at different rates according to the class of the traffic using any existing sampling strategy. Given a fixed sampling rate and a target fraction of sampled packets to allocate across traffic classes, FlexSample computes packet sampling rates for each class that achieve these allocations online. Through analysis and trace-based experiments, we find that FlexSample captures at least 50% more mouse flows than strategies that do not perform class-dependent packet sampling. We also show how FlexSample can be used to capture unique flows for specific applications.
  • Item
    Understanding the Network­-Level Behavior of Spammers
    (Georgia Institute of Technology, 2006) Ramachandran, Anirudh ; Feamster, Nick
    This paper studies the network-level behavior of spammers, including: IP address ranges that send the most spam, common spamming modes (e.g., BGP route hijacking, bots), how persistent (in time) each spamming host is, botnet spamming characteristics, and techniques for harvesting email addresses. This paper studies these questions by analyzing an 18-month trace of over 10 million spam messages collected at one Internet "spam sinkhole", and by correlating these messages with the results of IP-based blacklist lookups, passive TCP fingerprinting information, routing information, and botnet "command and control" traces. We find that a small, yet non-negligible, amount of spam is received from IP addresses that correspond to short-lived BGP routes, typically for hijacked addresses. Most spam was received from a few regions of IP address space. Spammers appear to make use of transient "bots" that send only a few pieces of email over the course of a few minutes at most. These patterns suggest that developing algorithms to identify botnet membership, filtering email messages based on network-level properties (which are less variable than an email's contents), and improving the security of the Internet routing infrastructure, may be prove extremely effective for combating spam.