CS 525s - Resources - Fall 2006
Where to find relevant material?
Where to look up papers
to supplement your understanding of a topic,
to prepare your proponent presentation
to prepare for your rebuttal, to get started
on your research project, and so on.
One good on-line server for where to go whenever you
are looking for high-quality database related
DBLP (Digital Bibliography and Library Project:
Of course, the library is another great place, as
it has interlibrary loan, search services into
the IEEE digital library and the ACM digital library,
citation indices, and so on, that you should familiarize
yourself with. These will likely be tools you will cherish
to use in future projects as well beyond this course.
Journals (typically published once a month):
ACM Transactions on Database Systems (ACM TODS)
IEEE Transactions on Knowledge and Database Systems (ACM TKDE)
Data and Knowledge Engineering Journal (DKE)
Information Systems Journal
ACM Transactions on Office Information Systems (ACM OIS)
IEEE Transactions on Software Engineering (ACM TSE)
Database Conference proceedings (typically published once a year):
ACM Special Int. Group on Data Management (ACM SIGMOD)
Int. Conf. on Very Large Databases (VLDB)
IEEE Int. Conf. on Data Engineering (ICDE)
Int. Conf on Information and Knowledge Management (CIKM)
ACM Principles on Database Systems (PODS)
The above lists are just a subset and there are additional respectable
sources for database material. For this
course, you should "visit" the library to make you familiar with
additional methods of locating material from valid sources.
Exclusive use of
material from "non-referred" web sites is not sufficient
and at times dangerous to one's health :). Know your sources !
Selected Subset of Stream and Sensor-Related Publications
Special Workshops in Sensors/Streaming (typically published once a year):
A variety of smaller workshops are beginning to be offered,
typically attached under the umbrella of some larger conference.
2nd International Conference on Geosensor Networks (GSN2.0),
WORKSHOP, Oct 1-3 2006, to be held in Boston, MA
Some Subset of Relevant Project Web Sites
Some Subset of Possible Sources of Data
Data is either synthetic or 'real'.
Synthetic data means that you have generated, typically
using some special-purpose generator with the goal to control
certain parameters that you deem likely to influence the
outcome of your results, such as
arrival rates, selectivities, and so on.
On the other hand, you are also encouraged to work with "real data" in
your experiments and demonstration, if at all possible. You can
often find data on the web if you do some looking.
Ideas for 'real data' include:
The Lawrence Berkeley Laboratory
has wide-area TCP trace data available.
You can view it at :
Scientific data such as
are also possible.
Their data rates tend to be slow (e.g., one measurement every ten
See NOAA for possible source of such data.
We can get you access to real fire measurement data monitored
from sensors instrumented in burning buildings from
the WPI digital fire library. Talk to the instructor about that.
Example queries of interest here might be to detect anomalies
by looking for nearby sensors that report severely
different readings compared to its neighbors. This query
would involve a self-join between two copies of the sensor stream,
with a window to assure time correlation,
predicates based on their spatial proximity,
and of course difference between readings.
You could use data generated for the Linear Road benchmark.
View Linear Road Website here.
You could look at data from the temperature sensors from the
CMU SensorNets project,
accessible on the web at this location.
At the University of Wisconsin, some online auction data from eBay
(crawled in late 2001) had been collected. They are now stored at CMU.
The actual files are :
Or, you can also
collect your own data by instrumenting
computer equipment to obtain network traffic traces, video game
control messages (great for spatial data), etc. Or, ask other
students for what data they may have generated.
If you have great
difficulty procuring a real-world data set that meets the needs of
your project, then you may generate synthetic data
conforming to the linear road benchmark or some other benchmark/schema
described in a research paper on data streams.
Some Places for Getting Access to Source-Code
You may want to consider to build
your implementation on top of the
Berkeley TelegraphCQ prototype,
which is a publicly-available data stream management
Or, you could look at
or more likely,
its follow-on system, the
Or, we also have our own WPI stream query engine,
called CAPE, that would be available for your use.
The high-level project description can be found here.
Though no user documentation is available at this time.
of Major database Conferences.
DBLP (DataBase systems and Logic Programming) Computer Science Bibliography:
(information about database system
conferences, journals and research groups)
DBWORLD (home of DB researchers worldwide.)
Citeseer publications research index:
(Computer and Information science papers)
DBWORLD (home of DB researchers worldwide.)
Mining Data Streams Bibliography:
Relevant Courses Elsewhere
Data Stream Processing (a course at CMU).
Data Stream Management System (a course at Brown Univ).