Just a random bit of sketching I was doing.
I don't think it's a new idea, but building an entire brand around a circus/freak show theme in an entirely unrelated industry could be fun. I never seem to have that kind of persistance when it's my own project though. Perhaps I should suggest it to a client.
In the midst of running around closing "mad deals, yo" and laughing about the new l33tspeak terms like intarwebs and pwned I ran across a few things in the last month or so.
I really think the person who decided on this had spent the day with a marketing person who filled their head with the term "Upsell."
Just try out the brainstorm function, I've never been able to realize ideas so fast. They have a web sitemap software as well, but this is truly the "cat's meow" for those of us who need a substantial amount of planning for a client or ourselves.
Data scrapers unite! Our prayers have been answered, the data gods have given us great manna from heaven and it is called Kirix Strata
A really solid designer-illustrator guy someone needs to hire
As usual, rather than download a small program I am certain is available to no end, I decided to reinvent the wheel in order to manage one of my client's need.
The primary function is to split large CSV files and maintain the headers in order to create smaller files for uploading to a catalog website that limits file upload size.
Built on .NET 2.0, does not include framework, no installationIn my work I spend a lot of time gathering data from non API sources, which has the upside of being very rewarding when you pull it off successfully, but has the downside of forcing you into potentially uncharted and ethically-challenging territory.
What's the take?
How much data am I going to be scraping? If I am about to scrape a measly thousand pages this doesn't take a lot of thought, just get in there and get it, but if I am confronted with hundreds of thousands of pages to scrape or even millions the strategy becomes very different. You ever try storing a few hundred thousand files in a single directory? Don't.
What's the security like?
Where are the guards, how might I be noticed? Some websites employ mechanisms to detect the likes of me coming, and will block IP addresses automatically for a certain amount of time, showing up on this radar tends to lead towards blacklisting. Always have a healthy list of proxy servers or a map to all the local wireless spots in your area... your choice.
How much time should I take?
Can I get away with it all day, or should I wait until after dark when the customers are gone? A client has expectations and you really have to be able to sit down and determine exactly what it will take to get the amount of information needed. When you are looking at scraping hundreds of thousands of website pages you'll find that shaving off a half a second per 10 pages, or some other minute improvement, will add up very quickly. I usually break such project into phases like
Why would anyone want to do this? Aside from any malicious use of screen scraping like RSS scraping for content generation or just plain plagiarism and the like, for a legitimate business owner, there just so happens to be two very good reasons.
First, some data is just not publicly available in a useable format, in this day and age it would seem that screen scraping is becoming less and less of a practiced art, but there are still some darkened corners of the Internet that just beg for someone to come along and use that secluded data in a more productive manner.
Second, in many instances cost is a big factor. There are certain organizations that will sell you their data or allow access to it on a subscription basis when that same data is simply sitting on a public website somewhere waiting to be plucked. In addition to the second reason, there are often caveats to the data provided by the source in which a tiered level of information distribution is employed, meaning you get the data and find out that in order to get what you really wanted you have to pay more.
Expect to need more hardware, more custom software and more coffee. Possibly some sort of counseling from time to time. Data scraping for profit can be painfully complex. The wear on you wont compare to the wear on your hardware though.
For shiggles, I wanted to add a nice little trendy date format to my postings, but everything I found seemed a bit too wordy for me. It always looks like this:
<div class="post-date"> <span class="month">10</span> <span class="day">04</span> <span class="year">1977</span> </div>
This site is dynamically generated, so it wouldn't take too much effort in order to implement such a structure, but I'm stubborn... so here goes
<%@ Import Namespace="System.Drawing" %>
<%@ Import Namespace="System.Drawing.Imaging" %>
<script language="VB" runat="server">
Sub Page_Load(sender as Object, e as EventArgs)
dim strDt as string = request.querystring("dt")
dim strMonth as string = left(MonthName(Month(strDt)),3).ToUpper()
dim strDay as string = Day(strDt).ToString()
if strDay.Length=1
strDay="0" & strDay
end if
dim strYear as string = Year(strDt).ToString()
Dim baseMap as Bitmap = new Bitmap(95, 13)
'13 cuts it off, which looks cool -- see emersian.com
Dim myGraphic as Graphics = Graphics.FromImage(baseMap)
Dim upBrush as SolidBrush = new SolidBrush(Color.black)
Dim downBrush as SolidBrush = new SolidBrush(Color.steelblue)
Dim MonthFont as Font = new Font("tahoma", 11,FontStyle.Bold)
Dim dtFont as Font = new Font("tahoma", 14,FontStyle.Bold)
myGraphic.FillRectangle(new SolidBrush(Color.white), 0, 0, 100, 25)
myGraphic.DrawString(strMonth, MonthFont, upBrush, 0, 0)
myGraphic.DrawString(strDay, MonthFont, downBrush, 30, 0)
myGraphic.DrawString(strYear, MonthFont, upBrush, 50, 0)
myGraphic.TextRenderingHint = System.Drawing.Text.TextRenderingHint.AntiAlias
Response.ContentType = "image/gif"
baseMap.Save(Response.OutputStream, ImageFormat.GIF)
myGraphic.Dispose()
baseMap.Dispose()
End Sub
</script>
%>
< img src="dt.aspx?dt=DATE STRING HERE" />
The image to load is designated in the Onload if the "holder" sprite
img.loadMovie("your file here");
Don't lose sight of what is actually important to your survival and what is not. So many of us get caught up in the clutter of various advertising scenarios and side projects that we can easily forget how we started and who our bread and butter customers really are. I recently reviewed the last few years of accounts and came to the realization that one of my most neglected avenues of income had added up to equal the payments of my largest client. Needless to say I have reinvested efforts into it and am beginning to see real results
Somewhere I once read that in five years you will be the people you associate with, the books you read and the music you listen to. This sounds a bit harsh, but I have to admit that I have ssen it firsthand and it's solid advice.
Surround yourself with the people you admire. Collaborate and invite critique. Seek out those who challenge and inspire you.
Getting dismayed is natural, it's really those who keep an eye on the prize that prevail. Fundamentally, you cant lose if you don't play, but can't win either. Keep doing what it is that you love, keep trying new things and your day will come.
Many, many of the most successful businesses around, especially all these web startups are founded on the idea of fixing an existing problem or adding a functionality that was needed. See a problem fix it. Just look at 37Signals, who knew that TaDaLists would turn into Basecamp, Hirise, etc.
Darren Rowse, who lives in "The House that Google Built" started only two years ago - sounds crazy doesn't it?
Know your exit. Know your exit. Know your exit.
I am not saying break the rules, but be certain to push them all the way to their limits. No one ever got rich by not pushing the boundaries of either customers or industry. This is absolutely true of todays online businesses. Look at the most successful eBayers... thousands and thousands of items listed, Bloggers with hundreds of sites, Google isn't just sitting back and raking it in they are constantly pushing the boundaries of what they can get away with. You should to.
A wrong is a wrong is a wrong, don't be afraid to admit when you are wrong. As long as you learn from your wrong decisions they are valuable decisions
Maybe a bit of both is really needed, but so many seem to fall into the pattern of playing it safe when what is needed is a little bit of courage
Despite what people may thing the Internet is still very much like the Wild Wild West
I am certain I left out a ton of great lines from gangster movies that could be added here, perhaps you know one?
Data visualization is such a help sometimes, I don't know why Google does not utilize the charts API in their Adsense reporting, but having been experimenting with the Charts API while developing this site a bit I decided to utilize it for some actually valuable research
In reviewing Jan 1 thru Apr 15 this year and last, I have come to the conclusion that server downtime is my biggest enemy.
It's not really my hosting companies fault though, I am just endlessly tweaking code and trying it out on the production server when I should be using a test environment first. Well that changes today, no more tweaking without testing. Hunkering down and reinvesting myself into all this has been great so far, I can't believe how motivated I am.
I hope you folks will bare with the breakneck pace of posts lately, just trying to keep up with my own mind
Now if I could only get past this -- Starting Sep 13, 2007 only websites with over 100,000 daily page views across user pages will be eligible to participate in the AdSense API program. I could really have something.