Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Film Glance Forum

  1. Home
  2. The IMDb Archives
  3. I need some help. How do you archive content?

I need some help. How do you archive content?

Scheduled Pinned Locked Moved The IMDb Archives
20 Posts 1 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    fgadmin
    wrote on last edited by
    #3

    MuggySphere — 9 years ago(February 04, 2017 06:40 PM)

    Oh OK
    I'll go find one thanks.

    1 Reply Last reply
    0
    • F Offline
      F Offline
      fgadmin
      wrote on last edited by
      #4

      Sleeping_in_Sleepy_Hollow — 9 years ago(February 04, 2017 07:09 PM)

      I need help with thisI want to save everything! Once they realize I wasn't the one posting harsh comments and give me back my archives.
      http://www.my-diary.org/users/851091

      1 Reply Last reply
      0
      • F Offline
        F Offline
        fgadmin
        wrote on last edited by
        #5

        Jeorj Euler — 9 years ago(February 04, 2017 08:43 PM)

        Well, the idea is to download every page of the thread listing on a given board and then download every page of every thread directly referenced in the downloaded board pages. It's just iteration after iteration. In some ways it is easier to use scripts, but a web crawler can be configured to do it. I think we're being granted temporary permission to do this, when it'd ordinarily be unwelcome. The subject belongs on the "Computers and Software" board:
        /board/bd0000100/threads/
        .

        1 Reply Last reply
        0
        • F Offline
          F Offline
          fgadmin
          wrote on last edited by
          #6

          Sleeping_in_Sleepy_Hollow — 9 years ago(February 04, 2017 09:10 PM)

          Check you PM. Need more help.
          http://www.my-diary.org/users/851091

          1 Reply Last reply
          0
          • F Offline
            F Offline
            fgadmin
            wrote on last edited by
            #7

            Jeorj Euler — 9 years ago(February 05, 2017 07:09 AM)

            Hm. Please standby. Archiving the topical message boards is somewhat trivial. It's the title boards and name boards that would take forever to scan and download.

            1 Reply Last reply
            0
            • F Offline
              F Offline
              fgadmin
              wrote on last edited by
              #8

              Jeorj Euler — 9 years ago(February 05, 2017 08:13 AM)

              Please see
              /board/bd0000001/view/265787442
              and check
              /board/bd0000100/threads/
              .

              1 Reply Last reply
              0
              • F Offline
                F Offline
                fgadmin
                wrote on last edited by
                #9

                MuggySphere — 9 years ago(February 05, 2017 03:58 PM)

                So would a script or webcrawler be good for archiving the movie forums for movies you have rated and discussed?
                That's the kind of thing I was wondering about, for all the movie boards I have participated in.

                1 Reply Last reply
                0
                • F Offline
                  F Offline
                  fgadmin
                  wrote on last edited by
                  #10

                  WillEd — 9 years ago(February 05, 2017 04:16 PM)

                  WinHTTrack Copier you could use to archive specific movie forum pages.

                  1 Reply Last reply
                  0
                  • F Offline
                    F Offline
                    fgadmin
                    wrote on last edited by
                    #11

                    timmyp-98035 — 9 years ago(February 05, 2017 04:43 PM)

                    IMDB should allow archive.org to do it

                    1 Reply Last reply
                    0
                    • F Offline
                      F Offline
                      fgadmin
                      wrote on last edited by
                      #12

                      MuggySphere — 9 years ago(February 05, 2017 06:36 PM)

                      They should but it would probably cost money wouldn't it?

                      1 Reply Last reply
                      0
                      • F Offline
                        F Offline
                        fgadmin
                        wrote on last edited by
                        #13

                        MuggySphere — 9 years ago(February 05, 2017 06:41 PM)

                        OK found it and I'm off to try.
                        There's a few boards I just have to keep for myself. Some of the discussions I really enjoyed so it would be nice to keep them.

                        1 Reply Last reply
                        0
                        • F Offline
                          F Offline
                          fgadmin
                          wrote on last edited by
                          #14

                          MuggySphere — 9 years ago(February 05, 2017 08:14 PM)

                          Hey there that program works fast.
                          But it only seems to copy the first page of a forum and not the pages in the links when you click on a discussion. Or did I miss a setting that would copy all the pages linked to the main forum page?

                          1 Reply Last reply
                          0
                          • F Offline
                            F Offline
                            fgadmin
                            wrote on last edited by
                            #15

                            WillEd — 9 years ago(February 07, 2017 07:19 AM)

                            You can't copy an entire forum with it. It works better on some sites than others. I tried copying a 3 page thread here with it and it picked up two pages, but not the third.

                            1 Reply Last reply
                            0
                            • F Offline
                              F Offline
                              fgadmin
                              wrote on last edited by
                              #16

                              MuggySphere — 9 years ago(February 07, 2017 04:31 PM)

                              What settings did you use?
                              No matter what I fiddled with it would only take the first page of anything I tried to save.

                              1 Reply Last reply
                              0
                              • F Offline
                                F Offline
                                fgadmin
                                wrote on last edited by
                                #17

                                !!!deleted!!! (61311691) — 9 years ago(February 07, 2017 05:12 PM)

                                Regardless of what you might have read, unless the site admin modifies their robots.txt file, there's nothing we can do other than email them and request an SQL dump. Neither of which is likely to happen.
                                Read the file below:
                                http://www.imdb.com/robots.txt
                                You can plainly see that almost all engines and crawlers are banned, and unless the IMDb administrator changes that file, no program or script you try to use to download the forum data will work. You'll grab the main page, but that's about it.
                                More here:
                                http://www.imdb.com/board/bd0000001/nest/265784055?d=265829976#265829976

                                1 Reply Last reply
                                0
                                • F Offline
                                  F Offline
                                  fgadmin
                                  wrote on last edited by
                                  #18

                                  MuggySphere — 9 years ago(February 07, 2017 09:16 PM)

                                  Ah. Oh well bye bye IMDB Forums you will be sadly missed

                                  1 Reply Last reply
                                  0
                                  • F Offline
                                    F Offline
                                    fgadmin
                                    wrote on last edited by
                                    #19

                                    !!!deleted!!! (61311691) — 9 years ago(February 07, 2017 05:22 PM)

                                    CORRECTION: It's not that the crawlers are banned (my apologies), it's that most of the directories are protected.

                                    robots.txt for IMDb propertiesUser-agent: *Disallow: /boardDisallow: /boards

                                    This means that ALL crawlers and scanners are prevented from downloading the entire board directory and all sub-directories below. See the actual file for more.

                                    1 Reply Last reply
                                    0
                                    • F Offline
                                      F Offline
                                      fgadmin
                                      wrote on last edited by
                                      #20

                                      Jeorj Euler — 9 years ago(February 07, 2017 06:31 AM)

                                      Bump.

                                      1 Reply Last reply
                                      0

                                      • Login

                                      • Don't have an account? Register

                                      Powered by NodeBB Contributors
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • Users
                                      • Groups