Sunday, June 25, 2006

Newspaper Design Algorithm

As I was mentioning in an earlier blog entry, the part of the FeedJournal project which I have been feeling most insecure about is how to design the algorithm for laying out the articles in the newspaper. This is a critical step for a number of reasons: it has to look like a newspaper, it has to read like a newspaper, and it has to be pretty (the output PDF is essentially a part of the FeedJournal GUI).

Anyway, I am happy to report that significant progress have been in this area. I developed an algorithm which dynamically creates a newspaper with customizable:

  • number of paragraphs
  • margins
  • paper size
  • font
  • spacing between rows and various article elements

I have also implemented support for a headline font size which is a function of the article's importance/size.

Together with the masthead (newspaper lingo for the first page logotype) the whole creation starts to look pretty snazzy, if I get to say it myself.

The algorithm implementing the layout is pretty simple, but efficient. First, I gather the collection of articles which are due to be published in the upcoming issue. I sort these according to size/importance; this step is made very simple by the new Generics classes in .NET 2.0. Then I take the first article which fits into the next available space on the current page and remove the article from the collection and mark the page space as occupied. If no article fits the remaining space I will publish what fits on the page and add a page jump to another page where the article is continued. Basically that's all there is too it, and this algorithm works very well so far.

In one of the coming blog posts I will attach a sample PDF file to showcase FeedJournal.

Monday, June 19, 2006

SQL Server Strangeness

So I got up at 4:30 today in order to get some serious work done on FeedJournal before going to work or the baby wakes up. Well that was my plan at least. The first part went fine; I got out of bed, went for a short run and showered. By then the baby was awake though so I had to do some multitasking with one arm holding the baby, while the other hacking away at the keyboard. But that's actually much nicer than it sounds. Seriously.

Pretty soon I run into problems with my SQL database that has been working flawlessly until now. Whenever I tried to connect to it I was thrown an SqlException: "Failed to generate a user instance of SQL Server due to a failure in starting the process for the user instance. The connection will be closed." At first I thought it was due to an incorrect connection string but everything seemed fine and the database was in the right location. So I hit the waves of the WWW to try to find anyone with similar problems out there. Loads of people had run into the same problem, but in most cases they were using Remote Desktop, which was the root of their problems. I'm not using that, so I was left to my own devices again.

I tried to restart the development environment. I tried to manually restart the SQL Server services, but to no avail. It wasn't until I did a reboot that the problem went away, and now everything is working fine again. Crossing my fingers.

Sunday, June 18, 2006

NP-Complete

I have been sick with the flu for the last week and still don't feel so great. For this reason the programming hasn't really proceeded as I expected. However, I have been doing a lot of thinking in my head about the database and class designs. As soon as I feel better I will work on laying out the PDF newspaper dynamically, which I realize will be a tough nut to crack.

Basically the problem is related to the classic computer science problem of bin packing, which is NP-complete. NP-complete is a computer science term, standing for "non-deterministic polynomial time". It basically means that there is no simple solution to the problem. My approach will be to take some shortcuts and make compromises so that the layout will be acceptable from a design viewpoint, while not digging myself into a hole with a too complex layout algorithm

In the meantime, while waiting for the fever to go away, I am reading some academic papers on newspaper layout and bin packing solutions. I don't think it will help my sickness, but it does make me sleepy.

Sunday, June 11, 2006

.NET multithreading

FeedJournal, like any RSS aggregator, needs to be efficient when it is updating the list of subscribed feeds. It is obvious that a sequential polling of feeds (check each feed and proceed to the next after finishing with the previous) will be sub-optimal in terms of performance and user experience. The internet requests will need to occur in parallel for optimal performance. However, if your feed subscription list contains more than a trivial amount of feeds, you don't want to congest your Internet line with all of these request at the same time.

IE7's RSS infrastructure calls this throttling, and it limits the number of concurrent web requests to 4. I don't see a reason to differ from this approach and implemented the same system.

One of the great things about .NET 2.0 is the easy-to-use infrastructure for multithreading and thread synchronization. By just dragging the BackgroundWorker component to the main form of my application I have all the threading support I will ever need. Starting a new thread in the application's process is then as easy as calling RunWorkerAsync() on the BackgroundWorker object. This will queue a new thread to be executed as soon as there is an available slot.

In FeedJournal's case the scenario becomes a little bit more complicated since if I let the method that iterates the feed updates run in the same thread as the GUI, it will become unresponsive. Therefore the main thread calls RunWorkerAsync(). The background worker iterates over all feeds and calls ThreadPool.QueueUserWorkItem for each item.

Throttling the maximum number of concurrent downloads is then simply achieved by the line:

ThreadPool.SetMaxThreads(4, 1000);

telling the ThreadPool to have a maximum of 4 actively executing threads at any given time.

Wednesday, June 7, 2006

FontDialog and Font Paths

One of FeedJournal's system requirements is to be able to customize the fonts in the PDF newspaper. Honesty I thought that this would be a piece of cake. But, I run into some problems...

In the PDF file you can specify Type 1 and Type 2 fonts. Type 1 are common fonts, such as Courier, Helvetica and Times Roman. Adding these fonts are very straightforward since the PDF format supports them natively.

However the difficulties begin when the user should be able to select any font. The Type 2 fonts have to be specified using an absolute path to the font (which can be either TrueType or OpenType). No problem right? Yeah, that's what I said yesterday too. I tried to simply add a FontDialog to my settings form. From the control I wanted to get the selected font's absolute path. No dice... No matter how hard I looked for the suitable property in the .NET's Font class, it simply didn't exist. Actually I didn't find any property in the Font object from which I could deduct the correct path (there is not necessarily a correlation between the font's name and filename). I was scratching my head for a long time until I cam up with a solution, which is working good .

I created a static class called InstalledFonts, which upon startup gets the path in the SystemRoot environment variable, and iterates over all *.ttf files in the Fonts subdirectory. For each file it finds, it tries to load it into a PrivateFontCollection and checks which styles are associated to it. Each font found is added to a Dictionary mapping fonts to their system path. Later on I simply use this dictionary to make the mapping between installed fonts and their path.

Monday, June 5, 2006

Project Management with ToDoList

Reading my fellow finalist Douglas Steen's entry about bug tracking tools, I am totally agreeing with him that it would be great to have a lightweight bug-tracking tool built into Express. Sure enough, we have the Task List pane where tasks can be sorted and having a priority but that's not really accomplishing anything substantial.

Douglas chose a web-based bug tracking system and he mentioned another web-based system. Hunting the Internet will lead you to yet other web-based systems. Why does 99% of bug-tracking systems have to be run in the browser? I hate the browser: it is less responsive than a native Windows application as well as usually lacking a menu and having quirky keyboard support.

Just because a system is multiuser doesn't mean that the browser is the only interface. The large advantage I see of using the browser is that no client software will need to be installed and we will support multiple operating systems, but I would happily trade this for a native Windows interface.

I was hunting high and low for the Holy Grail of bug tracking systems until a couple of months ago, when I finally discovered a wonderful freeware application called ToDoList.

ToDoList is the perfect application for a single developer who wants to manage any project. The interface is a bit on the complicated side, but can be customized it to suit your own requirements. For each hierarchical item you can add formatted comments, priority, estimation, tags, dependencies, etc etc. There is also a possibility to export the task list to XML for web publication along with a zillion other neat features.

ToDoList is hosted on CodeProject and is being actively developed with new features at a continuous pace. Try it, you won't regret it! I am actually writing each draft of my blog entries inside ToDoList.

The Feed Format Jungle

I have started the implementation of my project in C# Express Edition, and one of the first things I have stumbled upon is the frustration of having to deal with many different XML feed standards. There are RSS and Atom, each of them with several different sub-versions. But that's not all. We also have a slew of Internet cowboy hackers who don't have any desire at all to follow these standards. In short, RSS/Atom land is a jungle. Time to take out the machete! When researching the options of a suitable machete for the feed jungle, the following 3 caught my attention:

  1. Atom.NET + RSS.NET
  2. IP*Works
  3. Microsoft's RSS library, included in IE7
  4. Rolling my own component based on .NET's XML support

Atom.NET + RSS.NET

These are two separate open-source libraries, implemented in C# .NET, which enables users to work with the two feed standards and all of their sub-standards through a .NET programming interface. Unfortunately the two components expose two interfaces without much similarity. In addition to this the program is not in active development any longer Instead the author is creating a commercial closed source version of the components.

IP*Works

When registering the copy of Visual Studio 2005 Express Edition, one of the freebies that Microsoft offer you is a license of IP*Works' RSS component. The word free was misleading me for a while, until I realized that I was being offered a free developer license only, without any rights to distribute the component with the applications you are building in Express.

Microsoft's RSS library, included in IE7

With the upcoming Internet Explorer 7 (included in Beta2), Microsoft has really outdone themselves with the RSS/Atom support. Included in the browser will be a feed repository that any application can use to know which feeds are of interest to a user. Also articles and their read/unread state will be stored here. However, IE7 requires Windows XP or above, cutting off a large piece of the current end-user segment.

Rolling my own component based on .NET's XML support

Of course, being a developer, you are always attracted by the possibility of rolling everything yourself. However, considering the abundance of RSS/Atom formats out there, this would be suicide if I attempted this during the short time available to build FeedJournal within the contest.

Conclusion

After some prototyping with the different options I decided to go with the open-source Atom.NET and RSS.NET components. However, I quickly noticed some bugs and limitations, that I fixed in the components (the wonder of open-source!). I am wrapping Atom.NET and RSS.NET in my own classes "Article" and "Feed" which have different constructors for the different feed types.

Sunday, June 4, 2006

Ode to Visual Studio Express Edition

Using the Visual Studio Express editions to build a software product is a delight. I have been using Visual Studio for years, and can testify to the great quality of Microsoft's development tools. When asked about my favorite application all categories, I always answer VS. With the latest Express editions, Microsoft have outdone themselves again. Beside making the IDE available for free, there are many important new features in this package.

My hands-down favorite feature must be the built-in refactoring support. I have been a huge fan of Martin Fowler's landbreaking book "Refactoring", since its publication in 1999. Since I have been mainly developing in C/C++ during my development career, I have not had the privilege of using any refactoring tool professionally. (refactoring is dependent on reflection support, which is difficult to achieve, if not impossible, in C++ with all of its powerful features). With the advent of .NET and the initial versions of Visual Studio .NET, came the first 3rd party commercial refactoring tools, which were pretty decent but costly. I am very pleased to see Microsoft taking this step and integrating refactoring support within the IDE. The most common refactoring, "Extract Method" (to make a selected part of a method a new method) is included as well as "Rename class/method/variable". The only refactorings I am really missing from this basic package are "Extract Subclass" (to create a new class from a set of class methods) and "Move Method" (to move a method to a different class).

Other neat new things I like about the Express edition are the keyboard customizations, code snippets and code templates.

OK, that's all fine and dandy, but what can be improved? Well, I realize we are talking about first-class software that is being given away free of charge here, but it is still annoying me to see that there is no way to integrate source control in the Express editions. I would have liked to run the free Subversion system integrated in C# Express. It is also annoying that add-ons are not officially supported, although a few notable exceptions exist: nUnit and SQLite are both using unofficial workarounds to enable their components to integrate with the Express Editions.

Friday, June 2, 2006

Domain Names

OK, time to return to the blogosphere after my honeyweek with the baby.

I have set up a web site at feedjournal.com where all things related to my project will be collected and presented. In the meantime I have put up some basic information together with the project goals. I bought the domain from GoDaddy.com, and it was a very straightforward process.

feedjournal.com was actually not my first choice of product/domain name. I was initially having my eyes set on a different name but the .com name was taken. Or rather not taken, but parked, like almost all decent .com domains today. It's pretty frustrating to see that one after the other of all your candidate names are taken, and when you try the more esoteric names you find them taken as well. And then you try the really absurd names, and sure enough, none is available. Not that these domain names are in use, many are just bought by companies who sell them on for much higher prices.