|
|||||||||||
Researchers have been studying Internet Measurement problems, in conferences like ACM SIGCOMM Internet Measurement Conference (IMC), for more than 10 years now. This project will do a macroscopic study on what data are used by related scientific publications in the last decade and for which purpose. It will study a number of papers published in IMC and will answer questions like what type of data the authors are using? Were the data private or public? Were the data anonymized? Which was the source of the data? For which purpose the data were used? This study will enable us to derive some macroscopic statistics on how data are used in scientific publications studying the Internet.
Recent research by the Communication Systems Group of ETH Zurich has highlighted that frequent item-set mining, a simple, well-known, and widely-used data mining method, is very effective in finding the root-cause of network anomalies [3, 4] observed in network traffic flow data. This project will evaluate the performance in terms of running time and memory consumption of a selection of different publically-available implementations of frequent item-set mining algorithms available in [2]. It will use a trace of traffic flows and it will establish which implementation (and associated algorithm) is better suited to extract frequent item-sets from traffic flow data.
Recently, mediated data analysis has been proposed as an approach to enable researchers by-pass legal complexities associated with accessing data. A mediator, i.e., usually the data owner, conducts part of the analysis on behalf of a researcher and manually or automatically evaluates if the results of the analysis reveal any private information. A number of different techniques, like secure queries [2] and differential privacy [4], have been proposed to measure or limit the amount of private information revealed by the output of the analysis. This project will survey four recent papers on this subject [1, 2, 3, 4], will study the technical approaches taken, will identify limitations and advantages, and will compare them.
Obtain at least 5 packet-level traces from the WIDE backbone. The set of selected traces should cover different years and include different sampling points. Merge the observed packets to flows where a flow is identified by IP address, port number, and protocol number. Afterwards compute flow-level statistics and plots that include at least: distribution/CDF of packets per flow, distribution of bytes per flow. Moreover, try to identify changes over the years. Finally, we ask you to record a packet-level trace on your laptop/desktop (e.g., over one hour) that captures your normal usage. Analyze this trace and compare with results from the WIDE traces. To complete the project: Describe your results in a short report and present your experiences, and your analysis in class.
Read at least the three provided paper. Imagine you are a professor and you have to present this topic to students: how would you summarize these three papers? how are they related to each other? What is your opinion on this topic? Write a survey report on this topic and present the results in class.
Wichtiger Hinweis:
Diese Website wird in älteren Versionen von Netscape ohne
graphische Elemente dargestellt. Die Funktionalität der
Website ist aber trotzdem gewährleistet. Wenn Sie diese
Website regelmässig benutzen, empfehlen wir Ihnen, auf
Ihrem Computer einen aktuellen Browser zu installieren. Weitere
Informationen finden Sie auf
folgender
Seite.
Important Note:
The content in this site is accessible to any browser or
Internet device, however, some graphics will display correctly
only in the newer versions of Netscape. To get the most out of
our site we suggest you upgrade to a newer browser.
More
information