For those who don’t look at the comments

Comments:
Inspiring blog … The first week at RIT seems a ritual … Happens everytime with every1 ..
Am a newbie here at RIT drilled with a quarter of lost weekends, Insomnia and coding nightmares …
Couldn’t help but notice that 4.0 figure …
You must be GOD 😉
Bye, gotta go, Time for some willowy magic from dravid …
Keep the blog going, It makes good reading …

… always amok

Posted by Blade Runner at December 15, 2003 04:29 AM
Dear Newbie to RIT,

First of all, god bless you for adding a comment to an almost barren blog!

Next, don’t worry your life will return to normal once you have acheived your goal here in Rochester. Soon you will find that spending 30 minutes at a Shopping Mall will not result in a delayed Algorithms assignment. You will discover to your amazement that money is peripheral especially when you are no longer paying tuition. With chagrin you will find yourself with better things to do on a Friday night instead of debugging your Distributed Systems project.

You will, in short be enlightened as to why in the University of Life, a 4.0 GPA means nothing.

But always remember, RIT CS makes you slog now so that you don’t have to slog later!

Until that time, hold fort!

– Santosh

Posted by Santosh at December 15, 2003 11:06 AM

SIGKDD 2003

KDD 2003 – Accepted Papers
#117 Efficient Elastic Burst Detection in Data Streams
Authors: Yunyue Zhu, Dennis Shasha
#178 XRules: An Effective Structural Classifier for XML Data
Authors: Mohammed Zaki, Charu Aggarwal
#153 Proximus: A Framework for Analyzing Very High Dimensional Discrete-Attributed Datasets
Authors: Mehmet Koyuturk, Ananth Grama
#180 Fast Vertical Mining Using Diffsets
Authors: Mohammed Zaki, Karam Gouda
#264 Towards Systematic Design of Distance Functions for Data Mining Applications Authors:
Charu Aggarwal
#292 On Detecting Differences Between Groups
Authors: Geoff Webb, Shane Butler, Douglas Newlands
#358 Eliminating Noisy Information in Web Pages for Data Mining
Authors: Lan Yi, Bing Liu, Xiaoli Li
#375 Mining Concept-Drifting Data Streams using Ensemble Classifiers
Authors: Haixun Wang, Wei Fan, Philip Yu, Jiawei Han
#390 Maximizing the Spread of Influence through a Social Network
Authors: David Kempe, Jon Kleinberg, Eva Tardos
#457 Privacy-Preserving K-Means Clustering over Vertically Partitioned Data
Authors: Jaideep Vaidya, Chris Clifton
#469 To Buy or Not to Buy: Mining Airline Fare Data to Minimize Ticket Purchase Price
Authors: Oren Etzioni, Craig Knoblock, Rattapoon Tuchinda, Alexander Yates
#326 An Iterative Hypothesis-Testing Strategy for Pattern Discovery
Authors: Richard Bolton, Niall Adams

Are CS folks Mathematicians

> Since CS is (or at least should be) learning how to apply known algorithms to problems and the development of new algorithms to solve problems, CS should be very similar to math, and computer scientists ought to seem fairly similar to mathematicians.

For researchers in the ‘theory’ and ‘algorithms’ sub-fields of CS, I’d say they are mathematicians. They work with axioms and theorems and stuff just like other mathematicians do.

Other CS researchers are empiricists instead, e.g. most of those who do data mining or statistical natural language processing. And of course there’s lots of other stuff in between. (E.g., network researchers may start off with an algorithmic concept but then run simulations to demonstrate their algorithm’s effectiveness.)

There’s a family of jokes to the effect that PhDs in computer science don’t know anything about computers or programming or whatever. In actuality the individual’s engagement with computers/programming will vary very much with the sub-field he’s in. These days a theorist will need to be able to use LaTeX to write papers and read e-mail to see the conference announcements, but doesn’t need to program at all. OTOH someone doing experiments with genetic algorithms will probably write their own code for their experiments, and may even turn into a hardware geek by building beowulf clusters to run the massively CPU-intensive experiments on.

> Most early CS people, as I understand it, were math people with an interest in computers.

I think you can still find a lot of older CS professors with degrees in applied mathematics. Computers were around long before CS departments even existed.

“It has become very very clear that this war isn’t over.” — British officer in Iraq, June 24 2003

Final confirmation

I want to write this down, not to gloat, but to realize how many actual days
of my life I must have taken out to get these results.

Distributed Artificial Intelligence – A
Data Mining – A
Theory of Computer Algorithms – A

CGPA – 4.0

Just done with the Bayesian Classifier

I am finally done with the Bayesian Classifier. I have been working on it on and
off for 3 weeks now. Glad to get it over with. So far, I have only been able to
plug a decent accuracy of 82% with too many false positives. Its disheartening,
but at this point of time, I feel it is the best that I can do. I thought I should
blog my feelings at this turn. In another 2 hours I have to go home. I’ll go home
and have a bath, been at the lab all night. I hope my efforts payoff.

First steps towards solid research

Alright friends, I am blogging my first steps towards some meaningful research
today. :))))))

I walked into the lab today and Vineet tells me that our classifier is doing a
100% accurate classification feeding off itself. Yes, I know that does not mean
much, but this opens a whole new set of possibilities. (For those not into data
mining, take my word for it, for the experts, just bear !!)

Some of the things we plan to look at
1. Can we boost the accuracy with Intelligent Keyword selection
2. Can we actually use incremental learning algorithms to induce decision trees
3. Do we need to bias the filters in filtering out junk email towards more
conservative filtering?

Now that we have the infrastructure in place we can answer all these questions.
Thanks Vineet for getting us this far!!!

Discriminative Classifiers

In an attempt to model the brain, Computer Scientists came up with the Neural
Network. On of the interesting things I learn about an NN is that it takes 15
days to ‘learn’ one task or problem (blunt but a laymans point of view).
So that would make it probably comparable to the learning rate of a child …
are we there yet?