Archive for June 2006

June 29 Update

I’ve been working on the pedigree display code. It’s still not entirely clear in my head, which makes it hard to put together. :) At the moment I’m taking the database and exporting the selected individuals/families into an XML tree, which I’ll then be able to parse into a pedigree in HTML/CSS. I don’t know if this is the best way to go about it, but if it proves to be a problem, I’ll come up with something better. In the meantime it’s probably more prudent to go ahead with it and get *something* up, which I can then tweak as necessary.

(So much for imaginative titles. Oh well. :))

Another quick update

Got the family links working. One other problem with having so many relationship links is that the code gets a little ugly. Hmm. I’ll have to see if there are any alternatives that 1) still give me enough flexibility and 2) are beautiful. Anyway, coding Ruby on Rails is a lot of fun now that I’m actually coding instead of just thinking about it. :)

A quick update

I’ve been working on importing the family links from the GEDCOM. Almost there. Once I get that into the database, the pedigree display will be next. I’ve worked out most of the algorithm, though I haven’t yet decided whether it’ll be better to use a flat array for the data (and pull them out by ID) or use a tree (XML DOM, most likely). We’ll see… I should have more time tomorrow to work on this. I’m finding that large blocks of time (two or three hours) are when I get the most work done, especially once I get into the groove. Maybe I’ll start waking up at 3 or 4… Or maybe not. :)

OneWorldTree

I spent a little bit of time on Ancestry’s OneWorldTree today, checking out one of my lines (Abram W. Houchins and Martha Sneed). Turns out there’s a huge amount of information on their ancestors (thousands of names). Now, I don’t know that all of it is correct, of course, but it’s a start (and it’s a lot easier to verify it when I know what it is I’m looking for).

Why do I bring this up? Two reasons. First, there’s no way to download it from OWT. No link to download a GEDCOM, nothing. Annoying. I’ll have to go through it all by hand. There are 50+ generations. Yes, that’s right. Good thing it’s summer.

The other reason is that when you have that much genealogy, it would be really, really, really nice to get an overview of what’s there, where the holes are, etc. So I think Beyond needs supercharts, which will be overviews of your entire tree (or whatever part of your pedigree you choose, let’s say), shrunk to the size of a single page. I don’t know yet if I’ll just use a really tiny font size or if I’ll replace the names with something else (somehow). (Ideally it would actually be legible text, and since it’ll be PDF I’ll also include the option of printing at a larger size, 5 feet across or what have you.)

And finally, this is addicting. :) I’ve written on Footprints from the Past about what I found, and it’s really fun. Even if it’s not all accurate. (And half the fun will be verifying it and finding out that it really is true, or isn’t, or whatever the case may be.) I need more time!

At any rate, I’m really looking forward to being able to pull all this external information into my Beyond tree, greyed out so I know it’s not verified. With PAF it’s too hard to see at a glance what’s real and what’s not. I plan to go through OneWorldTree when I get Beyond to a usable state, seeing what’s been done on all the lines in my tree. Then I can add it all in, conflicting information and all, and start the work of verifying without worrying about missing stuff. Ah, it’ll be great!

On relationships

I’ve been working on Beyond for the past five hours or so. Worked on the database schema, realizing I’d forgotten about translations and a few other things (tags). Split the schema into four separate stages, which roughly parallel development on the program. Then I started work on a little Ruby on Rails app to create the database (via migrations) and populate it (by loading an XML file created from a GEDCOM). So far it’s working okay, and it’s giving me an opportunity to rethink some decisions.

As it stands, the current model has everything (name, gender, UID, etc.) as a characteristic which gets linked to the person via a relationship. So the People table itself only stores the ID, really. This means lots of characteristics and even more relationships. Hmm… The flexibility of the current relationships table means I can relate any two records in the database (two events, or an event and a picture, for example). But is that even a good idea? I guess my main concern is having a huge, unwieldy relationships table. We’ll have to see if the benefits of flexibility outweigh the downsides.

Anyway, now that I have some real data, I’ll be working on integrating it with the mockups (pedigree and so on). And then, after I figure out a good navigation scheme, you’ll be able to load a GEDCOM and view it online. Small steps. :)

Data model redux

Having thought about it some more, I’ve decided to go with a loose data model. The advantage is flexibility: people will be able to store whatever they need to, in a way that makes sense to them. As far as interoperability, yes, I think there will be some problems, but I have a feeling they won’t be too bad, especially with the template system (where there’ll be “expected” fields, like “First Name” and “Gender” and such). Another concern is data analysis, but I have some ideas on that.

Let me see if I can describe this clearly. When you go to the detail page for an individual (or a family, or an event), you’ll be able to organize the information about that person into groups (”Vital Events,” “Other Information,” whatever you want to call them). Each group contains items, which are either metadata (key/value pairs like “Hair color”/”brown” or “Religion”/”Baptist”), events (which in turn contain dates, places, and other metadata about the event), or research items (to-do lists, images, files, notes, etc.). You can order the groups and items any way you want via drag-and-drop.

I’m not entirely sure this’ll even work, but I’ll go forward with it and see how the tests go. If it flops, I’ll fix it. Collaboration could get interesting… Granted, I don’t think everyone is going to move things around all the time, and I suspect that most will stick with the standard template (vital events first — birth, christening, death, burial, etc. — and so on). But the flexibility’s there for those who need it. If you want to add a “Got Eagle Award” event, you can. If you want to add a to-do list for a particular ancestor (or a particular family), you can. If you want to add a table with census results from 1830 to 1870 for your ancestor, you can put it right there with the rest of his data, if that helps you. (And if it doesn’t help you, you can put it on a research page instead, keeping things separate.) My goal with Beyond is to set up as loose a framework as possible, just the foundation, and then get out of the way and let users do things the way they want to do them.

A handful of ramblings

First off, I’ve been reading the GENTECH Data Model spec. I read it years ago, when I was working at Ancestry, but time erases a lot of details. :) Anyway, it’s interesting food for thought. I don’t think I’ll end up adopting it (at least not wholesale), but it does have a lot of good ideas. I like the idea of being able to associate dates and places with characteristics (so you can say, “John Smith was a farmer from 1730 to August 1749 in Hartford, Connecticut”).

OpenID caught my interest today, primarily because of the easy sign-in capability. I’m still not entirely sure how it works, or if it’s even desirable for Beyond (genealogy may be a touchy area as far as that goes), but it’s definitely an option. I do plan on having MicroIDs implanted in the header of user’s pages, which’ll make it easy to use claimID to say “This is my genealogy.”

Coding-wise, I took some Ruby code to convert GEDCOM to XML and started writing some classes which convert the XML to Ruby objects. (Eventually the XML will disappear, of course; this is just a temporary hack to get some data to work with.) Once that’s done, I’ll write code to import the Ruby objects into the database, and then it won’t take long before the prototype goes live.

Speaking of the database, I’m almost done drafting the data model (and GENTECH is influencing a few things here and there). One issue I came up with a tentative solution for is that of sources. Ideally, you should be able to add a source to any bit of data that could reasonably have a source. So, to that end, I think the Sources table is going to be open-ended — instead of having a set list of types, there’ll be an “object_id” field and an “object_table” field, which means I can add a source to anything that shows up in a table.

But that’s still not enough, at least not yet. For example, events will include fields for the date and the place, preventing a source from being added specifically for the date (or the place), and instead forcing it to be added for the event as a whole. Hmm. The idea of being able to source everything is really nice, but is it feasible without turning the database into a mess?

One last thing. For storing information about individuals, I’m thinking about using a similarly flexible system: everything gets stored as a key/value pair. So instead of having set fields, you’d just add a “first_name” key and fill in the value. If there’s no middle name, you don’t have to add a middle name. The advantage is that you don’t have to use fields you don’t need, and you can use other fields that you do need (and that I’ve never heard of). The disadvantage comes in displaying the information and ordering it into groups. But maybe I could include a groups table, so you could put all the name information and gender and birth/death information into a “Vitals” group (or whatever you want to call it). Hmm… I’m considering the possibility of using templates to make this kind of thing easier for newbies — a scaffolding with common keys already in place for you to use.

End brain dump. :)

One small step for man

Last night as I was sitting in front of the Manti temple waiting for the Mormon Miracle Pageant to start, I realized I had a perfect opportunity to work on the database design. And so I did. I’ve got enough of it done that I can start implementing it and get some simple prototypes up and running. And finals are now officially over, so I’ll have much more time to work on this. :D

Coming up with titles for these little updates is hard. ~sigh~

Dates and stuff

I’ve been reading the W3C’s internationalization (i18n) page, since Beyond will be a global app almost from the get-go. It’s really important that it support not only other languages, but also the cultural aspects that come with them — date and time formats and script direction, for example.

As far as dates go, don’t forget that genealogy often deals with incomplete dates — just a year (”1775″), or just a month (”March”), or ranges (”before 1933″, “about 1854″). This makes date storage just a wee bit tricky, since the standard SQL date format wants more detail. Storing as a string takes care of the flexibility, but searching by date then becomes harder if you include ranges on the search form (1816 +/- 5 years). I’m still not sure how this’ll work. (Gee, I seem to be saying that a lot lately, don’t I. :))

And back to the translation issue: because different languages take up different amounts of space (Arabic is more compact than English, which is more compact than Finnish and German), designing the page can lead to headaches. It’s something you have to keep in mind the whole time. (Unless you’re only developing for English. But I’m not.)

Finally, as for dealing with the actual translating of the app, I’ll probably put together a simple web interface to the database, which volunteers can then access to translate the various terms. (We do something similar here at work, where our main site — http://immigrants.byu.edu/ — is available in six different languages.)

Data models

Yesterday Hilton blogged about the Genesis Data Model, which includes the Genealogy Core and Genealogy Provenance ontologies. I hadn’t really looked much at RDF until I read this, but it seems like a pretty good idea for Beyond’s interchange format.

For example, named graphs allow you to say, “All of this information came from this source.” I’m still not entirely clear on how redundant that might get, but it does allow you to stitch together what each source offered.

I checked out Practical RDF from the library and will be reading it once finals are over. Grokking RDF will no doubt help a lot with figuring out just how it’ll be useful in Beyond’s world. I do very much like the idea of the semantic web. More on this later. Much more. :)

I’ve also started sketching out the Beyond data model. Once I’ve established its requirements, I’ll better be able to tell whether any existing models (including Genesis) fit the bill, and if none do, which of them might be similar enough to alter instead of starting from scratch.

Now to get back to studying…