Exporting LiveJournal

For a long time now, I’ve wanted to export my LiveJournal account to PDF files so that I have a local copy of it. But LiveJournal has no export feature. There are sites like BlogBooker, who (for a fee and my LJ login) will generate PDFs for me; there are also other sites which (if I give them my LJ login) will import my LJ posts and comments. But I don’t trust any of those services to get everything. Plus, I wanted to find a solution on my own.

Attempt #1: use Automator (a macOS tool) to go through the site, scrape the URLs, visit the pages, and print them. And Automator has built-in functions to do all this! Problem is, LJ requires a login for displaying my restricted-access posts, and Automator can’t log in, so it misses a lot of my posts. And Automator insists on doing all the work itself instead of going through Safari. Oh well.

Attempt #2: use Selenium to control Safari to go through the site and print the pages. I’ve used Selenium at work for automated web page tests; it’s a powerful tool. Only, it insists on launching a clean instance of Safari every time (not reusing my login). I’ve tried binding it to my existing Safari instance but that just throws errors. Oh well.

Attempt #3: maybe I can write an AppleScript to send JavaScript to Safari to scrape links from pages, visit them, and select File -> Export As PDF automatically. And actually, as I was working on this, I had a better idea: why need to scrape links at all? I could just point AppleScript at the first post in my LJ, code it to use the Export As PDF menu item, and code it to click the “next entry” link on the page. It could then sequentially go through my entire journal, exporting posts as it went.

So that’s the approach I decided to go with, and it worked fine.

Here’s the AppleScript I came up with, in case it helps anyone else out there. This works with Safari, in Script Editor on macOS Catalina Sonoma – updated in January 2023 to work on the current macOS.

-- This script will export LiveJournal pages to PDF, one by one.
-- Start on the first LiveJournal entry page that you want to export to PDF.
-- (An entry, not a date. The URL needs to end in ".html".)
-- The script will save it (filenames will be sequential numbers) then click "Next Entry" and repeat.

-- Update postNumber here if the script fails and you need to restart it in the middle of your journal.
set postNumber to 1

-- Make sure this directory exists first.
set savePdfPath to "~/Desktop/pdfs/"

-- You may need to change the title to whatever your journal style uses.
-- set nextEntryLink to "document.querySelector('[title=\"next entry\"]')"
set nextEntryLink to "document.querySelector('[alt=\"Next\"]')"

-- Coordinates of the Safari window: left, top, right, bottom
-- Only the width really matters, because that affects the width of the PDFs.
-- (We want to keep the width consistent across your PDFs,
-- and not too wide or the text will be tiny if you ever print it out.)
tell application "Safari" to set the bounds of the first window to {100, 100, 1115, 1000}

set done to false
repeat until done
	
	-- save this page as a PDF by using the "Export as PDF…" menu item
	tell application "System Events"
		tell process "Safari"
			set frontmost to true
			
			-- if you want 3-digit numbers, change -4 to -3
			set numberAsString to text -4 thru -1 of ("0000" & postNumber)
			
			-- note that the menu item text uses an ellipsis character, not three periods
			-- also, sometimes it misses the menu selection, so you'll need to select it by hand
			click menu item "Export as PDF…" of menu "File" of menu bar 1
			repeat until exists sheet 1 of window 1 -- loop until it notices the click
				delay 1
			end repeat
			keystroke "g" using {command down, shift down} -- go to folder
			repeat until exists sheet 1 of sheet 1 of window 1
				delay 0.02
			end repeat
			tell sheet 1 of sheet 1 of window 1
				set value of text field 1 to savePdfPath
				keystroke "Return"
			end tell
			set value of text field 1 of sheet 1 of window 1 to numberAsString
			click button "Save" of sheet 1 of window 1
		end tell
	end tell
	
	-- make sure we have a link to the next page
	tell application "Safari" to set hasNext to (do JavaScript nextEntryLink & " !== null;" in document 1)
	if hasNext is false then
		set done to true
		exit repeat
	end if
	
	-- go to the next page
	tell application "Safari" to (do JavaScript nextEntryLink & ".click();" in document 1)
	set postNumber to postNumber + 1
	delay 2 -- give us a chance to leave the previous page first
	-- then wait until JavaScript says the page finished loading (though I don't know if this is reliable)
	tell application "Safari"
		tell document 1 to repeat
			do JavaScript "document.readyState"
			if the result = "complete" then exit repeat
			delay 0.5
		end repeat
	end tell
	
end repeat

display notification "Finished exporting your LiveJournal to PDF."

22 thoughts on “Exporting LiveJournal”

  1. This is genius. I had done an xml export a million years ago with a tool that probably no longer exists, and I’ve always found it striking that there’s no easy way to do an export natively. I’ve been slowly cleaning mine out for over a year, just a few posts at a time, when I think of it. I miss the community aspect of LJ very much, but most folks have long since wandered off.

    1. Thank you very much! This was a fun project and I’m glad it works as well as it does. I also really do miss LiveJournal, for the community aspect as well as the long-form writing. Maybe I’ll get back into it again one of these days…

      But, next up is to modify my script to delete posts.

    1. Slightly different use cases.

      His goal was to export his LJ in a way that could then be brought into WP. Doing it that way you’ve always got to wonder if you got everything, if you handled all the edge cases correctly, and you have to decide how you want the data represented on WP where the feature set is somewhat different.

      My goal was to get a pretty archive of my LJ, with the custom layout it uses on LJ, so that the PDFs look like the original LJ pages.

      It’s like the difference between being an organ donor and going to a Sears portrait session. Both involve representations of who you are, but they differ greatly in what the end result gives you.

  2. Pingback: Deleting LiveJournal | enchanter

  3. Hello,

    Giving this a whirl. My technical skills are limited, but I understood well enough to get this script to run. Problem is, it processes one entry and stops, without advancing to the next one. Every entry in my LJ is private, and many of them don’t have titles. Are either of these things an issue? Any ideas how I can get past this obstacle? I have 20 years worth of entries to export, so hoping to not have to do this manually. Appreciate any advice or support you can offer.

    1. This may be a simple fix – you just need to correct the selector for your “Next” link.

      My script has: set nextEntryLink to “document.querySelector(‘[title=\”next entry\”]’)”
      which looks for an HTML element with a “title” property set to “next entry”.

      Right-click your Next Entry link and Inspect it. See if maybe it has “title” set to something different, or if there’s another way you can have the script identify it? If it would help, give me a link to (a public entry in) your journal and I’ll have a look.

        1. I think I found it. Your ‘next’ button has the words capitalized: “Next Entry”, unlike mine which has “next entry”. Capitalize the N and the E in that one part of your script, and you should be good – let me know!

          1. That did the trick! Thank you so much for taking the time to help me and for this awesome script.

  4. I managed to get this to work despite having limited scripting/coding knowledge! But, it doesn’t seem to want to advance to the next entry. Any idea what the issue might be? My LJ does have a ‘Next Entry’ button. However, all my posts are private – could this be the issue? Hoping to not have to do this manually, since I’ve got 20 years (!) worth of entries to export! Appreciate any advice or support you can provide.

    1. It’s been a year and a half since you posted this – my apologies, somehow I never saw it!

      Advancing to the next entry is a tricky part of this script. You need to find the specific JavaScript query to return it. In my script above, that’s “document.querySelector(‘[title=\”next entry\”]’)”. That specifically means any item in the HTML that has a title of “next entry”. You would have to experiment in the JavaScript console to figure out exactly what your “next entry” link looks like and how to get it in JavaScript. (Maybe it’s as easy as just changing the title in my script from “next entry” or whatever the text is on your LJ?)

      I hope you’ve been able to get it working in the past year and a half. 🙂

    1. That’s a really good question! I did it on Mac because AppleScript lets me control the web browser, tell it to run JavaScript, make system menu selections, click buttons in system modals, &c. I don’t believe that Windows has a system-wide scripting capability like this. There’s a tool named Selenium which is usually used to automate testing of web sites, but maybe it can be used for something like this?

      My best advice for a PC user is: borrow a Mac for a little while to run this script. 🙂 Not making any value judgment about whether Mac or Windows is better; just saying that this particular task happens to be easier on a Mac.

      (and, before anyone brings it up – I don’t think Linux has any system-side UI scripting capability, either.)

  5. I’ve been looking for a way to export my LJ entries & comments, because the official LJ export is totally inadequate, so I was really excited when I found this post. The AppleScript so very nearly works for me – it exports the first entry to PDF but then it fails to find the “Next Entry” link on that page. It’s obviously there because I can click on it, but the querySelector step isn’t finding it. Any help would be greatly appreciated! https://henman.livejournal.com/

    1. I had a look at your LJ. Your individual post pages (for example, https://henman.livejournal.com/1506694.html) have a Next Entry link that looks like <a href="https://henman.livejournal.com/1511256.html" rel="nofollow ugc">Next Entry</a>. I don’t think there’s a CSS selector that can select that A element based on the text inside it. Mine works on my pages because my Next Entry links have title="next entry" as a property on the A tags.

      I’m rusty on my CSS, but some options for you might be:

      • find a CSS selector to recognize <a href="...">Next Entry</a>
      • edit your journal’s layout to add title="next entry" to these A tags
      • modify my script slightly so that JavaScript finds the A tag with the “Next Entry” text in it, and clicks on it

      Good luck! Let me know what you find!

      1. Thanks Brian. I think changing the LJ style is my easiest route (I don’t really care about the style – I just want to backup my content) so I’ll see if I can find an existing style that has the href title set.

  6. Hi! I was wondering if this method or any method might work for my livejournal that I seem to have lost access to and the majority of my posts are private.

    A friend passed away and it’s made me want to go back and reread everything I chronicled back in the day and I’m devastated it might be lost.

    Any and all help or advice would be so appreciated!

    1. Your first concern is regaining access to your LiveJournal account. Use their “Forgot password” link on login, or if you no longer have access to the email account it’s sending the reset password emails to, you may need to contact LiveJournal support directly. Without being able to log in, there’s no way to see your private posts.

      Once you’re back in, you would be able to use this script to download your journal entries – though you might have to modify it to get it to work with your particular journal. Or, you could take screenshots of the private posts.

      Good luck, and I hope you’re able to get back in and reconnect with the friend you lost. I also lost a friend who I knew through LiveJournal, and I’m glad to still have his comments on my posts.

  7. Brian, I’m trying to use your script to archive my lj, riffraff.livejournal.com starting with my first post https://riffraff.livejournal.com/2001/05/10/

    Mine has days, not entries, and no alt attribute on the next link, so I tried to use
    set nextEntryLink to “document.evaluate(\”//a[contains(., ‘Next Day’)]\”, document, null, XPathResult.ANY_TYPE, null )”
    but it’s not clicking the link.

    Second issue I think will be that there are days with no posts… my journal doesn’t seem to have “next entry” links.

    1. I can help!

      You do have individual entries – you’re just looking at them in the day format, which shows you all entries on a specific day (or no entries if there weren’t any on that day). There are no next/previous day links. You can see a complete (paginated) list of all of your entries at “https://riffraff.livejournal.com”, and if you go into an entry from there (click on its title), you’ll see Previous Entry and Next Entry.

      After some trial and error, I think the JavaScript query which returns your Next Entry link is:

      Array.prototype.slice.call(document.querySelectorAll('a')).filter(function (el) { return el.textContent === 'Next Entry' })[0];

      and so the “set nextEntryLink” you should use in your script (all on one line; no carriage return) is:

      set nextEntryLink to "Array.prototype.slice.call(document.querySelectorAll('a')).filter(function (el) { return el.textContent === 'Next Entry' })[0];"

      Give that a try, and please let me know if it works!

Leave a Reply

Scroll to Top