Genealogy anywhere

During my text processing class this afternoon I was reading through the phpGedView website, and I came across this question on the about page: “How do you synchronize your data with your relatives when they make changes on their computers?” (It’s a question phpGedView tries to answer.) When I read that, I suddenly thought of CVS (Concurrent Versioning System). Rather than explain CVS (you can read about it on your own if you’d like), let me explain what this means for genealogy. Imagine the following scenario:

Your family tree data is stored on a central server (either something like Writeboard.com or your own personal web server). When you use Beyond on your Mac, it downloads a local copy of your data from the server and you can work on that without being connected to the Internet (useful for when you’re in the field). If you want to collaborate with someone else — let’s say it’s your sister — you can give them access to all or just part of your data. They download their own local copy and work on that. When you’re done making changes, Beyond uploads the data back to the server. Your sister does the same thing. What if there are conflicts? It allows you to decide which changes take precedence, so nothing is lost. Seamless.

It gets better. Say you have three computers at home. You install Beyond on the second computer and it will automatically download your data from the server, complete with latest changes. Ditto for the third. You can work on whichever one you want and don’t have to worry about synchronization — Beyond takes care of that for you. If you go on a trip, you can take your laptop with you, work on your genealogy as much as you want to, and when you get back home, it’ll re-sync with the server so that your desktops will automatically get the updates.

Now for the crown jewel: what if you’re on a computer where you can’t install Beyond? What if — gasp — you’re on Windows box? There’ll also be a web app which lets you access the same data as the desktop app — it’s all on that central server — and so you’re up-to-date no matter where you go on the world, on any computer. Any computer. And you don’t have to worry about flash drives or copying files or anything like that. The web app would have to have reduced functionality, of course, but you’d still be able to do all the things that matter (including printing charts to PDF). So you’re at the library and forgot to bring your pedigree with you? No worries, just go to one of the computers, log in, and print the chart (remember that this is from your latest data, the same you were working on fifteen minutes ago when you left your house).

As far as the user interface goes, it’ll have to be seamless, utterly transparent and easy to use. No clunky upload/download dialogs (unless something goes wrong). It’d be nice to be able to “check out” only certain parts of your tree, but I don’t know if that’d work too well… An advantage would be that edit histories would be built-in (by nature of the whole central-access metaphor), so you could see who edited what and when. Another is having duplicate copies of your data (you have at least two, one on your desktop and one on the server, and more if you’re collaborating with someone else). Truth be told, I can’t think of any disadvantages. If you can, please leave a comment.

You know, this is extra incentive to use SQLite, because I can use that for the desktop implementation and a normal MySQL database for the online storage. Hmm, I’m liking this idea a lot. But of course you wouldn’t have to have an online server to be able to use Beyond — it’d be perfectly usable as a standalone desktop app that never connected to the Internet at all. No worries there.

So, after I came up with all this, I was reading Dan Lawyer’s post Raising the Bar for Record Managers and came across this comment by Dallan Quass (the guy in charge of WeRelate:

Another possible 11th suggestion is the ability to “sync” your local desktop client with an on-line record manager, where you can see what changes others have made and accept or reject those changes in your desktop repository. This is similar to what software engineers use when a group of distributed engineers collaborate using a shared on-line repository that can be sync’ed with their off-line desktop repositories.

Sounds mighty familiar. :) But to my credit, I hadn’t read that until after I’d come up with all of the above. But that doesn’t matter, because this isn’t about me, this is about progress and making a better genealogy experience for all of you. And that’s what I’m committed to.

    Comments on “Genealogy anywhere”:

  1. Permalink to this comment Hilton

    I’m not sure if you’re aware of the FamilySearch FamilyTree service that the Church is currently working on. You’ve basically outlined a good deal of what they’re trying to accomplish (and succeeding, as they’re starting beta 2 soon). Something to look into if you haven’t yet.

  2. Permalink to this comment Ben

    I’m aware of it, yes. From what I’ve seen, though, it doesn’t seem like they’re expecting users to actually do their genealogy on the site — rather, they’d use PAF to do their research and then upload their information (via GEDCOM, I think, but maybe not). You can do limited editing, if I recall correctly, but it didn’t seem to be very fleshed out. But I could be completely wrong. One of my coworkers was a beta 1 tester, and I think he can still login to the site. If so, I’ll take a look at it on Monday and see how much overlap there is. It’d be pointless to reproduce work that the Church is doing (especially because they have so much more manpower and significance). Hmm…

  3. Permalink to this comment Hilton

    I hope to hear what you find out. I only raised the concern because I’ve had to consider whether or not I was duplicating effort with my project. I’ve concluded that I’m not, and the same is probably true for you.

    As for this genealogy anywhere concept, it’s great. I look forward to seeing it develop.

  4. Permalink to this comment Ben

    Oh, it’s certainly a valid and important question, since duplicating effort often feels like a waste (unless there are very good reasons for doing it). And I don’t really want to step on the toes of the Church. :)

    You may have seen this already, by the way, but Dan Lawyer (product manager for the Family History Department) has a blog, and he recently posted about raising the bar for record managers. It’s a good read and he raises many important points.

  5. Permalink to this comment Dallan Quass

    I guess great minds think alike :-). Seriously though, here are some issues I’ve thought of, and I’m wondering what your thoughts are on them:

    What will do you about data on living people? Do you provide a very secure (and detailed) access control model, or do you not store data on living people on the server at all? There are also international issues here that I’m not totally aware of, where some european countries have strict limits on what information can be kept on living people and where it can be stored.

    What if your sister (or anyone else you want to share your genealogy with) uses a different client than the one you’re working on? You could say that she would just export/import gedcom from the central server, but the gedcom “standard” is loose enough that it’s likely there will be some data loss round-tripping between the gedcom format her software emits and the one your centralized server accepts and emits. Would you require that everyone wanting to work on your pedigree with you use the new client you’re developing?

    How important is the need to allow off-line updates to your pedigree? If you remove this requirement, you can use Ajax (say using the new Tibco SDK) or Flash (say using Adobe’s new Flex SDK) to build a cross-platform interactive pedigree program. Then others who wanted to work with you on your pedigree could either (a) use the offline client you’re building, or (b) if they didn’t want to use that client for some reason they could edit the pedigree online using the Ajax/Flash interface.

    By the way, the Church *does* expect people to do editing on their new website. But there are some things that make it less than ideal for a family group to share updates amongst each other, so what you’re doing is still worthwhile.

    Also by the way, if you’re interested in working on an ajax/flash interface, let me know.

  6. Permalink to this comment Ben

    Hmm, I hadn’t thought about data on living people. Right now I don’t necessarily see Beyond as being a published set of data (meaning, even though you can access it from a web client, it’s not a public set of pages). Originally I’d thought of having Beyond output a set of HTML files, which seems to be the norm, but instead I think it would be better to say, “I want these lines to show up on this website,” and that way the link is live and you don’t have to keep uploading those HTML files. The less the user has to do to keep things going, the better. This way, the user would choose which lines and which people to expose to the public view, basically, and I think that solves the living people problem.

    As for compatibility, Beyond will be PAF compatible, and if I can get file format specs for other record managers, it’ll read/write those formats as well. I agree that GEDCOM is rather lossy in some cases, and that’s not good. There are other standards (most based on XML, it seems), and I’ll be evaluating those to see if any would work. Not for storage, that is — it’ll be SQL — but for transfering data. In all reality, though, I doubt that those companies are necessarily going to want to hand out their file format specs. Some may, but certainly not all.

    I think that to pull off the synchronization, other collaborators will have to use Beyond, because other record managers won’t have the sync functionality. (If they added it, however, that’d be another question.) My hope is that Beyond will be good enough that everyone will want to switch to it anyway. ;)

    I’m not quite sure I understand what you mean by removing offline updates to the pedigree. I do plan on using Ajax to build the web client, and it’ll be quite functional for editing one’s pedigree. (The main difference I see between the offline clients and the web client is that some of the fancier stuff won’t be in the web client, but with advances in technology it may become a moot point.) I want the web client to be as rich as possible. In fact, if this takes off, I see people mainly using the web client instead of the desktop client (like Gmail and the other web-based e-mail clients replacing desktop e-mail clients). And yet there still needs to be offline functionality, because sometimes you go places where there isn’t any Internet access.

    Does the Church expect Family Tree to replace PAF? It didn’t seem like that was the case. Hopefully Beyond will be able to interface with Family Tree (and WeRelate, for that matter). We’ll see.

  7. Permalink to this comment Dallan Quass

    That makes sense regarding living data - require people to grant specific access to other users. Also, if you require that everyone use Beyond, then you don’t need to worry as much about GEDCOM compatibility. Your allowing non-beyond users to edit the pedigree online using an ajax client is what I was trying to suggest in my previous note. My question is the same as yours: which client will be used by the most people, and which should be developed first?

    I don’t think you could say that the Church expects Family Tree to replace PAF. First of all, their initial incarnation doesn’t have a desktop interface. Nor does it have the ability to export GEDCOM. My understanding is that these things wil be added later. And they’ll probably want other desktop genealogy programs to interface with Family Tree through some kind of sync protocol. But a shared family respository like a CVS repository, where the updates that you make are automatically reflected in my copy the next time I sync, and where the two of us can choose to share data on living as well as dead people, would require more work I think. So it seems you have a good niche with your idea.

    BTW, I’ve checked into the various XML GEDCOM specs. Most of them don’t appear to have had any activity in years. GenXML is a the only exception I could find. Developing a parser that can accept the various flavors of GEDCOM out there and output data suitable for XML or database representation is proving to be somewhat of a challenge.

  8. Permalink to this comment Ben

    I’ve become pretty much convinced that the way to go is develop the web client first and foremost, because it’s going to be used by the most people. I personally plan on spending my time working on the web, but the API will make desktop clients possible (and hopefully people will want to write them), so that there’s offline synchronization in those cases where people don’t have Internet access. (And there’ll certainly be those cases.) So web it is. And if it appears that a desktop client is desperately needed, so be it, but I have a feeling that it’s not going to be as important as I thought. (And if I’m wrong, well, the API will make it an easy problem to fix. :))

    The CVS repository idea is still important, of course, even using web clients alone, because there’s always the possibility that you and someone else could be working on the data at the same time.

    As for Family Tree, even if Beyond ends up being similar (excluding the desktop clients for the time being), I think it’s still worth it to proceed, because innovative ideas are important (like tags, for example). And variety is good. And people may want to host it themselves. Okay, I think I need to come up with better reasons, and it’s starting to sound like I’m repeating myself a lot. :)

    Is Family Tree going to be open source, by the way?

  9. Permalink to this comment Dallan Quass

    I’m not the best person to ask regarding whether family tree is going to be open source. The Church does seem to be moving (albeit slowly) in that direction

  10. Permalink to this comment Hilton

    I attended a UVPAFUG meeting where Brad Christensen said that Family Tree would be open source. What that means specifically, I don’t know.

    Regardless, Beyond as a web application is a great idea. I’ve been mulling over some future scenarios and it seems to me that a centralized solution will not ultimately be scalable. Furthermore, not everyone will trust the Church as a data repository. And of course, as you mention Ben, some simply want to host their data themselves. This offers the additional advantage of flexibility and innovation, which would come much more slowly to a bureaucracy.

    I think these reasons are compelling enough to press forward. I would make the obvious recommendation though of a careful separation between data and presentation. Create a solid data source component, along with a useful web-based UI. Don’t worry about the API just yet. When/if anyone wants to write a fat client for it, it won’t be difficult to design a good API. If noone does, at least you’ll have a good architecture.

  11. Permalink to this comment Ben

    Just out of curiosity, would you care to expound on why a centralized solution won’t be scalable? (Not that I disagree, but I’m interested to see why.)

    And yes, I agree that not everyone will want to host their data with the Church. That’s just something that happens with religious organizations. And I also agree that innovation is a lot easier in a smaller group (which is why startups are often successful as opposed to large companies like Microsoft — Paul Graham talks a lot about this).

    There will indeed be separation between data and presentation. It’s often a pain when the presentation doesn’t do what you need it to do. (For example, today in the lab where I work, someone wanted to copy the individual list from PAF into Excel. But you can’t do that, or at least not in any easy way. You can’t select the list or do anything except export it into GEDCOM, really. Not good. Locking anyone into using only your program is a bad thing.

  12. Permalink to this comment Hilton

    The main reason I don’t think it will be scalable is because genealogy is (or will be soon) a lot more than just textual records. Initially FSFT won’t be accepting large notes or attachments. That may change, but how much will you be able to upload in the end? Photos? High-res scans of historical documents? Audio? Video? I don’t think it’s the Church’s intention to service a massive media locker. Nor do I think they could, effectively. This requires a distributed solution.

    I’m particularly interested in the high-res scans. It is ludicrous for people to type information from old documents into their computer when they can scan them with inexpensive hardware and perform human assisted-OCR. You end up with fewer errors, and the actual source can be automatically included in the database with no extra effort. If I disagree with what someone typed in, I can just look at the scan instead of hunting down the actual document.

  13. Permalink to this comment Ben

    Ah, good point. I haven’t thought too much about the media part yet, but it warrants a fair amount of attention. And it feels like there’s room for some good innovation here. Haven’t a clue what exactly that would be, but it’s there.

    Does anyone know how OCR for old handwriting is coming? Last I heard, it was still pretty far behind, but maybe some advances have been made. But I agree that having a scan of the original document right there is very nice.

  14. Permalink to this comment Dallan Quass

    Check out http://ciir.cs.umass.edu/irdemo/hw-demo/ This is the most advanced demo for old handwriting recognition I’ve seen. There’s also a research group at INRIA, but I don’t know if they have an on-line demo yet.

  15. Permalink to this comment Ben

    Cool. OCRing old handwriting (or even some people’s handwriting nowadays :)) is a tough problem, and it’ll be interesting to see how things progress…

  16. Trackbacks/Pingbacks:

Leave a Reply